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ABSTRACT 


We propose new nonparametrie statistical tests to identify whether each element 
in a sequence of independent multivariate observations is drawn from a common 
probability distribution or if some distributional change has occurred over the course of 
the sequence. Each test is formulated using matching techniques based on distances 
between observations. These tests are capable of detecting changes of quite general 
nature, and, unlike most similar tests, they require no distribution assumptions or any 
prior separation of the data into hypothetical pre- and post-change subsets. We derive a 
central limit theorem for one of the tests and an exact distribution for another. A third 
culminating test, which is a cumulative sum of statistics on a collection of orthogonal 
matchings associated with the observation sequence, exhibits noteworthy power to detect 
whether a distributional change has occurred. We examine the performance of the tests 
by computer simulation and compare results to a state-of-the-art parametric competitor. 


V 



THIS PAGE INTENTIONALLY LEET BLANK 


VI 



TABLE OF CONTENTS 


I. INTRODUCTION.1 

II. PROBLEM BACKGROUND AND LITERATURE REVIEW.5 

A. PROBLEM FORMULATION.5 

1. Change Points.5 

2, Taxonomy of Change-Point Problems.6 

B. PARAMETRIC APPROACHES.8 

1. Univariate Case.8 

2, Multivariate Case.8 

C. NONPARAMETRIC APPROACHES.10 

1. Univariate Case.10 

2. Multivariate Case.12 

a. General Approaches . 12 

b. Graph-Theoretic Approaches and Matching . 13 

III. THEORETICAL RESULTS.29 

A. THE SUM OF PAIR-MAXIMA TEST.30 

1. The Statistic.30 

2. Expected Value and Variance.30 

3. Using Stein’s Method to Estahlish Asymptotic Normality.31 

4. An Improvement to the Normal Approximation hy Edgeworth 

Expansion.36 

5. Treatment of Odd Sample Sizes.39 

6. On the Consistency of .40 

7. A Graphical Example.45 

B. THE NON-BIPARTITE ACCUMULATED PAIRS TEST.48 

1. The M^ Statistic.48 

2. Critical Envelope.49 

3. A Graphical Example.53 

C. THE ENSEMBLE SUM OF PAIR-MAXIMA TEST.56 

1. Orthogonal Successive Optimal Matchings.56 

2. The Statistic.60 

3. Brownian Bridge Motivation for .63 

D. THE BIPARTITE ACCUMULATED PAIRS TEST.73 

1. The Z Statistic.73 

2. Critical Envelope.74 

3. A Graphical Example.76 

IV. SIMULATION STUDY.81 

A. METHODOLOGY.81 

B. PERFORMANCE RESULTS.88 

1. Multivariate Normal Case.89 

vii 









































a. Changes in Location . 89 

b. Changes in Scale . 91 

2. Multivariate Normal Mixture Case.93 

a. Changes in Location . 93 

b. Changes in Scale . 94 

3. Multivariate Weibull Case.95 

C. DIFFERENT COST FUNCTIONS.97 

D, COMPUTATIONAL DETAILS.99 

V. CONCLUSIONS AND OPPORTUNITIES FOR FURTHER RESEARCH.101 

APPENDIX A: HIGHER MOMENT DERIVATIONS.107 

A. MEAN AND VARIANCE OF .107 

B. COVARIANCE OF AND Y^ .108 

C. THIRD MOMENTS OF , Fj, AND Y, .109 

APPENDIX B: QUANTILE TABLES.113 

A, APPROXIMATE CRITICAL VALUES FOR .113 

B. APPROXIMATE CRITICAL VALUES FOR .114 

APPENDIX C: R FUNCTIONS FOR CRITICAL ENVELOPES.117 

A. CRITICAL ENVELOPE FOR THE NAP TEST.117 

B, CRITICAL ENVELOPE FOR THE BAP TEST.118 

APPENDIX D: EXAMPLE DATA FOR FIGURES 1-7.119 

LIST OF REFERENCES.121 

INITIAL DISTRIBUTION LIST.127 


viii 

























LIST OF FIGURES 


Figure 1. Minimum-weight bipartite matching on 20 points; m=8, n=\2 .21 

Figure 2. Minimum-weight bipartite matching on 20 points; m=10, n=10.21 

Figure 3. Minimum-weight non-bipartite matching on 20 points.22 

Figure 4. Rosenbaum’s cross-match statistic with no change point.25 

Figure 5. Rosenbaum’s cross-match statistic with change point.25 

Figure 6. Minimum weight non-bipartite matching on 20 points with no change in 

underlying distribution with respect to observation order.46 

Figure 7. Minimum weight non-bipartite matching on 20 points with a change in 

underlying distribution with respect to observation order.47 

Figure 8. Critical envelope for the NAP test and two cases of with N = 200 

and a = 0.05 . Reject the null hypothesis if exceeds q“ for some k. ..53 

Figure 9. NAP test statistic for Figure 6 data, no change point detected.54 

Figure 10. NAP test statistic for Figure 7 data, change point detected.55 

Figure 11. Example for which no more than N! 2 matchings may be obtained.57 

Figure 12. Example for which exactly N - \ matchings may be obtained.58 

Eigure 13. Eive orthogonal successive optimal matchings on N = 6 observations 
with associated Latin square. Cost function does not necessarily satisfy 
the triangle inequality.59 

Eigure 14. Quantile-Quantile plot of —l)) for 10,000 simulations of N 

observations from a standard bivariate normal distribution; A = 300, 

k = \ .68 

Eigure 15. Quantile-Quantile plot of —l)) for 10,000 simulations of N 

observations from a standard bivariate normal distribution; A = 300, 

k = \0 .69 

Eigure 16. Quantile-Quantile plot of fi^(A:/(A —l)) for 10,000 simulations of A 

observations from a standard bivariate normal distribution; A = 300, 

k = 30 .70 

Eigure 17. Quantile-Quantile plot of fi^(A:/(A —l)) for 10,000 simulations of A 

observations from a standard bivariate normal distribution; A = 300, 

k = m .70 

Eigure 18. Quantile-Quantile plot of fi^(A:/(A —l)) for 10,000 simulations of A 

observations from a standard bivariate normal distribution; A = 300, 

/t = 150.71 

Eigure 19. Minimum weight bipartite matching on 20 points; m = 4 and n = \6 with 
no change point. The control set is circled; line segments connect 

observations that are paired in the optimal bipartite matching.77 

Eigure 20. Minimum weight bipartite matching on 20 points; m = 4 and n = \6 with 
a change point r = 13. The control set is circled; post-change point 


IX 





















observations are boxed. Line segments eonneet observations that are 

paired in the optimal bipartite matehing.78 

Figure 21. Typieal ehange seenario: mean jump at r = 101. The mean jumps in the 

first dimension only and remains fixed in all other dimensions.84 

Figure 22. Typieal ehange seenario: mean drift at r = 101. The mean drifts in the 

first dimension only and remains fixed in all other dimensions.84 

Figure 23. Typieal ehange scenario: mean jump at r = 161. The mean jumps in the 

first dimension only and remain s fixed in all other dimensions.85 

Figure 24. Typical change scenario: mean drift at t = 161. The mean drifts in the 

first dimension only and remain s fixed in all other dimensions.85 

Figure 25. Typical change scenario: scale jump at t = 101. The scale jumps in all 

dimensions.86 

Figure 26. Typical change scenario: scale drift at t = 101. The scale jumps in all 

dimensions.86 

Figure 27. Typical change scenario: scale jump at t = 161. The scale jumps in all 

dimensions.87 

Figure 28. Typical change scenario: scale drift at t = 161. The scale jumps in all 

dimensions.87 

Figure 29. Effect of mixing proportion on JJS false alarm rate for MVN mixture 
cases with dimension d = 2, 5, and 20 based on = 200 , a = 0.05 , and 
10,000 simulations. Scale of mixing = 4; proportion of mixing 
varies along the x-axis.94 


X 













LIST OF TABLES 


Table 1. Critical values of the statistic for A=200 estimated by 10,000 

simulations, Edgeworth expansion, and normal approximation.38 

Table 2. Cost matrix for the regular hexahedron in Figure 11.58 

Tables. Achieved test levels for max^gj^j ^^ 2 } —l)) using Brownian 

bridge critical values for 10,000 simulations of N observations from a 
standard bivariate normal distribution. Achieved test level for each 
combination of N and a is the fraction of simulations for which 

^^/2} .72 

Tabled. Critical envelope and BAP test statistic Z for Figure 19 data. Zj. 

never exceeds q ^., so the null hypothesis of no change is not rejected.77 

Tables. Critical envelope and BAP test statistic Z for Figure 20 data. 

Zg > <^6, so the null hypothesis of no change is rejected.79 

Table 6. Test power to detect a mean change of magnitude A for MVN case with 
dimension d = 5 and change point t = 101 based on N = 200, 
a = 0.05, and 1000 simulations. Jump case is to the left; drift case is to 

the right.90 

Table 7. Test power to detect a mean change of magnitude A for MVN case with 
dimension J = 20 and change point t = 101 based on A = 200, 
a = 0.05, and 1000 simulations. Jump case is to the left; drift case is to 

the right.90 

Table 8. Test power to detect a mean change of magnitude A for MVN case with 
dimension d = 5 and change point t = 161 based on A = 200, 

a = 0.05, and 1000 simulations. Jump case is to the left; drift case is to 

the right.91 

Table 9. Test power to detect a scale change of magnitude A for MVN case with 
dimension d = 5 and change point t = 101 based on A = 200, 

a = 0.05, and 1000 simulations. Jump case is to the left; drift case is to 

the right.92 

Table 10. Test power to detect a scale change of magnitude A for MVN case with 
dimension d = 5 and change point t = 161 based on A = 200, 

a = 0.05, and 1000 simulations. Jump case is to the left; drift case is to 

the right.92 

Table 11. Test power to detect a mean change of magnitude A for MVN mixture 
case with dimension d = 5 and change point r = 101 based on A = 200 , 
a = 0.05, and 1000 simulations. Jump case is to the left; drift case is to 
the right. Shading indicates excessive false alarm rate.93 


XI 














Table 12. 


Table 13. 


Table 14. 


Table 15. 


Table 16. 


Table 17. 


Table 18. 

Table 19. 
Table 20. 
Table 21. 


Test power to deteet a seale ehange of magnitude A for MVN mixture 
ease with dimension d = 5 and ehange point r = 101 based on = 200 , 

a = 0.05, and 1000 simulations. Jump case is to the left; drift case is to 

the right. Shading indicates excessive false alarm rate.95 

Test power to detect a change in the scale parameter /J of magnitude A 
for MV Weibull case with dimension J = 5 and change point t = 101 
based on V = 200 , a = 0.05 , and 1000 simulations. Jump case is to the 
left; drift case is to the right. Shading indicates excessive false alarm rate. ...96 
Ensemble Sum of Pair-Maxima (ESPM) test power to detect a mean 
change of magnitude A for MVN case with dimension d = 5 and change 
point T = 101 under different cost functions, based on V = 200, 

a = 0.05, and 1000 simulations. Jump case is to the left; drift case is to 

the right.97 

Ensemble Sum of Pair-Maxima (ESPM) test power to detect a mean 
change of magnitude A for MVN mixture case with dimension d = 5 and 
change point r = 101 under different cost functions, based on V = 200 , 
a = 0.05, and 1000 simulations. Jump case is to the left; drift case is to 

the right.98 

Ensemble Sum of Pair-Maxima (ESPM) test power to detect a change in 
the scale parameter /J of magnitude A for MV Weibull case with 
dimension J = 5 and change point r = 101 under different cost functions, 
based on V = 200 , a = 0.05 , and 1000 simulations. Jump case is to the 

left; drift case is to the right.98 

Typical time to compute V / 2 orthogonal successive optimal matchings 
calling Kolmogorov’s algorithm (modified to compute orthogonal 

successive optimal matchings) in R.99 

Typical time to compute V / 2 orthogonal successive optimal matchings 

using Derigs’s algorithm in R.100 

Estimated critical values for .113 

Approximate critical values for .115 

Example data for Eigures 1-7.119 


xii 












LIST OF ABBREVIATIONS 


BAP 

Bipartite Accumulated Pairs 

CUSUM 

Cumulative sum 

ED 

Euclidean distance 

ESPM 

Ensemble Sum of Pair Maxima 

EWMA 

Exponentially weighted moving average 

GMA 

Geometric moving average 

JJS 

James, James, and Siegmund 

MST 

Minimum spanning tree 

MD 

Mahalanobis distance 

MD-R 

Mahalanobis distance, robust 

NAP 

Non-Bipartite Accumulated Pairs 

NNVE 

nearest-neighbor variance estimation 

RD 

Multivariate rank distance 

SPM 

Sum of Pair Maxima 


xiii 



THIS PAGE INTENTIONALLY LEET BLANK 


XIV 



ACKNOWLEDGMENTS 


This work is a product of the investments of many. I express my deepest 
gratitude to those below, without whom this effort would never have succeeded; 

I thank God above all for providing me the opportunity to do this work and for 
granting me all things necessary to bring it to eompletion. 

I thank my wife, Leticia, whose unwavering love, prayer, understanding, and 
encouragement have been vital to this and any other accomplishment of my adult life. 
Thanks also to my children, Christian, Eliana, and Elizabeth, for giving my heart cheer 
during this labor. 

I thank my parents, Michael and Barbara Ruth, who have taught me what it means 
to work hard, equipped me to meet challenges, and supported me in everything I aspire to 
do. 

Thanks to my advisor. Professor Robert Koyak, whose insight, vision, 
supervision, and guidance have not only made this product what it is, but have shaped me 
immensely as an academic professional. 

I appreciate the contributions of my other committee members. Professors Kyle 
Ein, Javier Salmeron, Craig Rasmussen, and Eyn Whitaker, who have both improved this 
work with their input and improved me through their instruction and assistance. 

My special thanks go to Professor Samuel Buttrey, who invested substantial time 
and effort to help me implement state-of-the-art computational routines in order to 
accomplish the numerical analysis in this study. 

I am grateful to Dr. Eric Bechhoefer for his advice and thoughtful contributions to 
my research and to Goodrich Euel and Utility Systems for providing funding to support 
this effort. 

I appreciate Dr. Jeffery Haferman and Eric Adint at the Naval Postgraduate 
School’s High Performance Computing Center for their assistance to me as a Hamming- 
eluster user for my simulation work. 

Thanks to the staff at the Dudley Knox Eibrary, who provided me valuable 
research assistance to complete this work. 


XV 



Thanks also to Professor Paul Rosenbaum at the Wharton School, University of 
Pennsylvania, whose contributions in this line of research strongly motivated our work; 
to Professor David Aldous at the University of California at Berkeley, who suggested 
Stein’s Theorem to us as a possible approach to proving the central limit theorem 
contained herein; and to Professor Bo Lu at the Ohio State University, who very 
graciously provided us with efficient computational routines to implement some of our 
ideas. 

Finally, I am thankful for the support of my office-mates over the last three years; 
Hiro Sato, Ali Al-Rowaei, Eng Yau Pee, Jay Foraker, and Anthony Tvaryanas. They not 
only sharpened my thinking, but also made this labor more fulfilling by their friendship. 



EXECUTIVE SUMMARY 


Given a sequence of observations, has a change occurred in the underlying 
probability distribution with respect to observation order? This question arises in a 
variety of applications, including quality control, machinery health diagnosis, 
biosurveillance, and image analysis. This problem is encountered throughout statistical 
literature and is often referred to as “the change-point problem,” where “change point” 
refers to the index of the first observation for which the underlying probability 
distribution is different from that of previous observations. Detecting change points in 
high-dimensional settings is challenging, and most change-point methods for multi¬ 
dimensional problems rely heavily upon distributional assumptions such as multivariate 
normality or the use of observation history to model probability distributions. In practice, 
such strong distributional assumptions are often invalid and can lead to poor detection 
performance, and useful observation histories are often unavailable. Also, most change- 
point methods are applicable only to changes of a specific type (for example, an abrupt 
change in distribution mean) when in many cases one is interested in detecting more 
general types of change as well (for example, changes in scale or gradual changes). 

We propose new nonparametric statistical tests to detect the presence of a change 
point in a sequence of multivariate data based on the graph-theoretic concept of 
matching. Each test requires only the assumption of some reasonable function to 
measure dissimilarity between observations. We state the change-point problem by 
representing the observation set as a complete graph in which each observation is a vertex 
and each pair of vertices is connected by an edge whose weight is the dissimilarity 
between the two vertices. Then we pair observations together in such a way as to 
minimize the total cost of the collection of pairs; this collection of pairs is called a 
matching. Our statistical tests for a change point use the fact that if a change has 
occurred with respect to order in the underlying distribution of a sequence of 
observations then the sequence labels of the pairs in the matching are closer together on 
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average than if no distributional change has occurred. By considering not just the lowest- 
cost matching but rather several low-cost matchings, we achieve considerable power to 
detect whether there is a change point in a sequence of observations. 

We examine the performance of these tests by simulation in various change-point 
settings considering different underlying probability distribution family, dimensionality, 
change point location, change parameter (distribution location or scale), type of change 
(abrupt or gradual), and magnitude of change. Each test demonstrates power to detect a 
change point at fixed false alarm rate in every case examined. The most powerful of 
these tests is the Ensemble Sum of Pair-Maxima (ESPM) test, which computes the 
cumulative sum of the larger sequence labels among all pairs in a collection of low-cost 
matchings and measures the deviation of this sum from its expected value. The ESPM 
test has change detection power comparable to a state-of-the-art parametric competitor, 
the maximum likelihood ratio test of James, James, and Siegmund (JJS), even when the 
parametric assumptions for that test are met. When those assumptions are not met, the 
ESPM test retains noteworthy power to detect a change point, while the false alarm rate 
of the JJS test increases. 



I. INTRODUCTION 


The problem of identifying a change in a stochastic process is often referred to as 
“the change-point problem” and has been a subject of enduring interest in statistical 
literature. A simple statement of this problem is, “Given a sequence of observations, has 
a change occurred in the underlying probability distribution with respect to observation 
order?” This problem arises in a variety of applications, such as; 

• Quality control. Samples are taken from a particular manufacturing process, 
perhaps over time or across different stages in the process, that carry information 
regarding the quality of the end product. Change-point methods are used to indicate if, 
where, or when the process departs from an “in-control” condition. 

• Machinery diagnostics and fault detection. Consider a complex machine for 
which various measurements are taken during its operation that provide an indication of 
machine health. A change in the distribution of these measurements with respect to time 
might indicate some form of health degradation. 

. Biosurveillance. Suppose occurrences of a particular disease are cataloged by 
geographic location and monitored over time. Change point methods may be used to 
detect whether a disease outbreak has occurred. 

• Image analysis. Consider a sequence of images of the same scene taken at 
different times; for example, satellite images of some geographic area or magnetic 
resonance imaging scans on an individual. Evidence that the scene is changing in some 
significant way may be found by means of change point analysis. 

This research is about detecting whether change has occurred. Specifically, we 
investigate nonparametric tests to detect change points of very general nature in 
multivariate data. Cases of “very general nature” include those of abrupt or gradual 
changes in the mean of a distribution, or changes in its scale. While many real-life 
processes exhibit gradual change, fairly little investigation has been done in the area of 
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detecting gradual changes in multivariate data. Furthermore, a relatively small body of 
work exists proposing nonparametric solutions to multivariate change detection 
problems, although interest in this area appears to be growing. 

We examine nonparametric methods that rely on matching, which involves the 
pairing together of observations based on some measure of dissimilarity. Existing 
statistical applications of matching techniques include; 

• assessing sensor accuracy in test and evaluation of radar and joint-tracking 
systems by pairing detected objects with truth objects, 

• comparing the accuracy of different methods for estimating the locations of 
impact points in munitions testing by pairing estimated locations to “ground truth,” and 

. pairing subjects based on similarity measures for clinical trials and observational 
studies, 

to name a few. In this paper, we introduce new methods of this type that prove to be 
powerful to detect change over a wide range of alternative hypotheses. 

Our work is organized as follows: In Chapter II we classify the field of change- 
point problems into its two main categories, sequential and non-sequential techniques, 
with a formal discussion of how problems in each category are framed. We then 
summarize our review of the literature in this field, with particular emphasis on 
nonparametric approaches, followed by a graph-theoretic overview of matching. The 
chapter concludes with a review of the most recent work on this problem based on non- 
bipartite matching (defined in the next chapter), which is our primary area of interest 
here. In Chapter III, we propose new statistical tests based on non-bipartite matching. 
First, we introduce the Sum of Pair-Maxima (SPM) test and the Non-Bipartite 
Accumulated Pairs (NAP) test, and develop the supporting theory for these tests in some 
detail. Of primary importance are the proof of a central limit theorem for the SPM test 
and the derivation of the exact distribution for the NAP test. Then we introduce the 
dominant test of this paper, the Ensemble Sum of Pair-Maxima (ESPM) test, which is an 
extension of the SPM test that involves extracting additional change-point information 
from orthogonal matchings. Finally, we present the Bipartite Accumulated Pairs (BAP) 
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test as an application of bipartite matching techniques to change-point problems where 
some observation history is available. Chapter IV demonstrates the performance of the 
SPM, NAP, and ESPM tests by means of a simulation study. The power of these tests is 
compared to a state-of-the-art parametric competitor, the maximum likelihood ratio test 
of James, James, and Siegmund (1992), for various cases including different underlying 
distributions, different types and magnitudes of change, and different dissimilarity 
measures. Chapter V summarizes our findings and outlines opportunities for further 
research in this field. 
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II. PROBLEM BACKGROUND AND LITERATURE REVIEW 


A. PROBLEM FORMULATION 

This research addresses the following specific question; Given a sequence of 
independent multivariate observations, is the sample statistically homogeneous? In other 
words, has the underlying probability distribution from which the observations were 
drawn remained constant or has it changed"? As discussed in the previous chapter, this 
problem emerges in a wide variety of applications and is traditionally referred to as “the 
change-point problem.” 

1. Change Points 

We define the term change point as follows: Given a sequence of independent 
random variables (Xj,Aj,...,let F. denote the probability distribution of X.. 

Then an integer tg{ 2,...,A} is a change point with respect to measure 6 if 

Fj =^2 =--- = F^_i, F^_j^F^ and ^(Fi,F.)-max^g^^_ .j^(f^,fJ is strictly positive 

over 7 g{t,...,A}, where h is a measure of distance between two probability 
distributions. The Fj {j>T) are not necessarily distinct from one another; for example, 

the distribution change at r may be associated with an abrupt mean change, or “mean 
jump.” Or Fj may some simple function of j>T; for example, the distribution change 

beginning at r may be associated with a gradual mean change, or “mean drift.” More 
complicated forms for Fj are allowed as well. A formulation of the general change-point 

problem in a hypothesis-testing framework with respect to observations Aj, Aj,..., A^ 

consists of defining null hypothesis 

(2.1) H,: F,=F,= - = F, 

and corresponding alternative hypothesis 
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//j: There exists an integer r, 2<Tq<t <t^<N , such that 
(2.2) F,=¥,=■■■ = F^_, ^ F^ , and 

h (Fj , ^ (f^ , fJ is strictly positive over ; e {r,..., A^}. 

Usually we take Tq = 2 and f = N but in some cases may wish to restrict the change 
point to a narrower interval. 

An important taxonomy exists within the family of change-point problems; we 
present a particular classification scheme here, similar to Brodsky and Darkhovsky 
(1993) and Basseville and Nikiforov (1993), to serve as a concept map of sorts, to 
provide a framework for review of the literature in this field, and to make clear the 
classification of the problem for which we offer solutions. 

2. Taxonomy of Change-Point Problems 

Change-point problems can be classified into the two broad categories of 
sequential and nonsequential analysis. Sequential analysis involves detecting the 
occurrence of a change while monitoring a system “on-line;” that is, data collection is 
ongoing in time and analysis is performed sequentially as the data set is updated. For 
such cases, the null hypothesis is tested over and over again as new data are added to the 
set of observations as follows: 

1) With observations Aj,...,A,_j on hand, add A,. 

2) Test for a change point in {2,...,t} . 

a) If a change point is detected, perform some predetermined action. 

b) If not, go back to step (1) with Aj,..., A, on hand. 

A classic application of sequential analysis is in the arena of quality control, where some 
particular process is ongoing in time and process outputs are sampled and tested in 
sequence to identify if some undesirable change in the process has occurred. 
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In contrast, non-sequential analysis involves the examination of a finite sequence 
of data “off-line” with the purpose of identifying whether or not some ehange has 
oeeurred during the observation period. Non-sequential analysis can be divided further 
into the eases of a single test or simultaneous tests. For a single test, one eonsiders a 
sequence of multivariate observations and a known or assumed time 

rG{2,...,N} and then tests the null hypothesis : F^ = F^=-= Fj^ against the 
alternative F^ = F^=-- = F^_^ ^ F^= = ■■■ = F^ (or perhaps a one-sided 

alternative). One example of this type of problem is the elinieal trial seenario, where two 
groups of subjeets are drawn from some common population, one group is administered a 
treatment and the other a placebo, and the problem is to test whether the treatment has 
some particular effect. In the single test framework, N specimens are split into a control 
and a test group with {Xj,...,X^_j} and {X^,...,X^} being the two associated sets of 

observations (the order of observations within groups does not matter for this case), and a 
single hypothesis test is performed regarding r (hence the name “single test”). 

For simultaneous tests, no speeifie eandidate for ehange point r is assumed; the 
null hypothesis (2.1) is tested against alternative (2.2) as stated. Such tests are 
simultaneous in the sense that they involve testing all possible change points 
simultaneously. An example of sueh a test relates to maehinery health management; 
Consider a military aircraft with an on-board device that records various measurements 
on the aircraft with respect to time during flight; these measurements are known or 
believed to be indicators of aircraft health. Assume for the sake of this example that the 
observations are independent with respeet to time. Let (Aj, Aj,..., be the sequenee 

of these observations. When the aireraft returns from its mission, these observations are 
analyzed for evidence of health degradation. That is, do the observations provide 
evidenee of a ehange from some “healthy” distribution to a “less healthy” one? If so, 
may one infer when the degradation oeeurred (or began to oeeur)? 

Machinery health diagnosis and prognosis problems are strong motivations for 

our research effort; consequently, the foeus of this research is non-sequential 

simultaneous testing. Our literature survey finds that powerful nonparametric tests for 
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multivariate change-point problems are not abundantly available, particularly in the non¬ 
sequential simultaneous testing case. We now proceed to review several parametric and 
nonparametric approaches to univariate and multivariate change-point problems, 
concluding with a discussion of the graph-theoretic concept of matching and its 
application to such problems. This will set the stage for our introduction in the next 
chapter of new matching-based solutions to the change-point problem. 

B. PARAMETRIC APPROACHES 

1. Univariate Case 

The two-sample Mest, described in Tanis and Hogg (2008), is perhaps the most 
widely known test for heterogeneity, though it generally is not presented in change-point 
terms. It is one of the first tests introduced in an undergraduate statistics course, and it 
usually is framed as a test for a difference in the mean of two samples. While it is widely 
applicable, it has the obvious limitations that 1) it applies only the univariate case, 2) it 
assumes the underlying distributions are normal, and 3) it only tests for differences in 
distribution means. 

In quality assurance, the classic univariate sequential test for a change point is the 
cumulative sum (CUSUM) test introduced by Page (1954). Others include the sequential 
t-test (Rushton, 1950), Geometric Moving Average (GMA) or Exponentially Weighted 
Moving Average (EWMA) procedures (Roberts, 1959), and Bayes-type procedures 
(Girshick and Rubin, 1952; Shirayev, 1963). These procedures are powerful in many 
settings and can be applied as non-sequential or sequential change-point tests. Of course 
these tests are also limited to univariate cases, although some multivariate extensions 
exist. Additionally, they require assumptions about the underlying data distribution 
(normality is usually assumed). 

2. Multivariate Case 

The multivariate analog of the two-sample t-test is Hotelling's two-sample test 
(Hotelling, 1931), which detects differences in the mean vectors of two multivariate 
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normal samples. Hotelling’s statistic is a non-sequential single test. Sequential 
multivariate parametric tests include multivariate extensions of CUSUM procedures 
(Basseville and Nikiforov, 1993; Runger and Testik, 2004) and EWMA procedures 
(Lowry et al, 1992; Prabhu and Runger, 1997). 


A change-point test by James, James, and Siegmund (hereafter, “JJS”) (1992) 
associated with an abrupt change in the mean of a multivariate normal distribution 
interests us particularly. JJS is a non-sequential simultaneous test—the area of our 
interest in this work—and it serves as a powerful competitor to methods we present later. 
Given multivariate observations JJS uses the modified likelihood ratio 

test statistic 


(2.3) 


r„s= max 


N 


K<k<h k{N-k) 


S.--S. 




r 

k 




k k 

where U^='^X^X[, and k^ and kj are the lower and upper limits, 

/—I /—I 

respectively, of the interval possibly containing a change point. The actual likelihood test 
is based on the case = 1 and kj = A — 1 (that is, the change point could be anywhere 
in the observation sequence), but the test provides the flexibility to limit the search for a 
change point to a subinterval of (l, A — l). JJS show that the tail probabilities for this test 


can be well-approximated by 

R(rjjs >x)^ 


(2.4) 


r(p/2) 


Nx 


pH 


(1-x) 


{N-p-3)/2 






y/2 


t(l —t)(l —x) 


dt, 


where k^/ N and kj / A ^ k as A —> oo , 0 < < k < 1, and 


(2.5) 



2 

^exp 




with 4? being the standard normal cumulative distribution function. The integral term in 
(2.4) is computed satisfactorily by numerical methods. JJS present a heuristic 


9 



modification to improve their tail probability approximation; we do not describe those 
details here, but we do utilize the modification of the JJS test for eomparison purposes 
later. 

A drawbaek to all these parametrie approaehes to ehange-point detection is that 
they are not necessarily robust aeross different underlying data distributions. In 
particular, the assumption of multivariate normality in many cases proves to be diffieult 
to justify. We proeeed to review some existing nonparametric approaches to the change- 
point problem. 

C. NONPARAMETRIC APPROACHES 

1. Univariate Case 

A classic nonparametric test to determine whether two univariate random samples 
eome from the same population is the Mann-Whitney test (Mann and Whitney, 1947), 
also known as the Wileoxon rank sum test (Conover, 1999). Let Aj = {Xj,..., and 

A 2 = be two sets of observations with all X. being members of some 

ordered set. Assign ranks (or midranks in the ease of ties) to the observations with 
respeet to set ordering and let denote the rank of observation A^. Intuitively, one 

would expect that if the observations in Aj tend to be smaller than those in then the 
ranks of the observations in Aj would be smaller on average than the ranks of those in 
Aj . To determine whether Aj and Aj are drawn from the same population one 
eomputes the Mann-Whitney test statistie, whieh is simply 

m 

(2.6) T„^='£r(X,). 

Let fy and be the distribution functions corresponding to the observations in Aj and 
A 2 , respeetively, and let T ~ fy and Z ~ Ej. Then the hypotheses assoeiated with the 
Mann-Whitney test may be stated as follows. 
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H.: P(y>Z) = 0.5; 

( 2 . 7 ) ; 

P(F>Z)^0.5. 

For small samples, quantiles of are found in tables or by using standard funetions in 
statistieal software (sueh the “qwileox” funetion in R (2005)). For large samples is 
asymptotieally normal. While the Mann-Whitney test is eonsistent against mean 
differenee alternatives, it is not sensitive to other types of differenees (for example, 
differenees in seale). 

Another rank-based non-sequential single test, whieh is eonsistent against a 
broader range of alternatives but is less powerful than the Mann-Whitney test (Conover, 
1999), is the Wald-Wolfowitz runs test (Wald and Wolfowitz, 1940). Observation ranks 
are eomputed as above, but this time the ranks are eollected into runs of eonseeutive 
ranks that eome from the same group (either Aj or Aj). The test statistie is simply the 

number of runs in the eolleetion; a relatively small number of runs indieates that the two 
samples are from different distributions. Like the Mann-Whitney test, the Wald- 
Wolfowitz test is asymptotieally normal. 

Two other tests that are eonsistent against any type of differenee that might exist 
between underlying distributions are the Kolmogorov-Smirnov test and the Cramer-von 
Mises test. If 5j and are the empirieal distribution funetions for the observations in 

Aj and Aj, respeetively, then the Kolmogorov-Smirnov test statistie is given by 

(2.8) =sup|5i(x)-52(x)| 

and the Cramer-von Mises test statistie is given by 

( 2 - 9 ) T^,u=-^ E 

[m + n) ;(6AiUAj 

In other words, these tests statisties evaluate the supremum norm and norm, 
respeetively, of the differenee between two empirieal distribution funetions. Large 
values of these statisties are evidenee that Aj and Aj are drawn from different 

probability distributions. 
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These four partieular tests broadly represent the two primary teehniques used for 
nonparametrie ehange-point deteetion both in univariate and multivariate eases: 
teehniques based on rank permutations (such as Mann-Whitney and Wald-Wolfowitz) 
and tests based on distribution function estimation (such as Kolmogorov-Smirnov and 
Cramer-von Mises). The new methods we present in the next chapter are grounded in 
rank permutation arguments. 

Nonparametrie sequential tests include generalizations of CUSUM procedures 
such as those introduced by Bhattacharya and Frierson (1981) and Gordon and Pollock 
(1995). These procedures apply to the univariate case only; no extensions of these tests 
to the multivariate case have yet been proposed (Fricker and Chang, 2009). 

2. Multivariate Case 

a. General Approaches 

Multivariate extensions do exist for both the Kolmogorov-Smirnov and 
Cramer-von Mises tests. For example, Bickel (1969) extends the Kolmogorov-Smirnov 
test to by defining multivariate rank vectors, computing empirical distribution 
functions with respect to within-group multivariate ranks, and then evaluating the 
supremum norm on the difference of the two empirical distribution functions as a test 
statistic. Prsestgaard (1995) extends Bickel’s result to more general sample spaces. 
Baringhaus and Franz (2004) propose a Cramer-von Mises-like statistic on by 
comparing the average Euclidean distance between points in different groups to the 
average distance between points in the same group. Hall and Tajvidi (2002) propose a 
permutation test using a nearest-neighbors approach that generalizes both the 
Kolmogorov-Smirnov and Cramer-von Mises tests to the multivariate case. These tests 
all rely on simulation to compute estimated quantiles for the test statistic null distribution. 

Li and Liu (2004) apply the notion of data depth to nonparametrie tests for 
changes in multivariate location or scale. Data depth is a way of measuring how “deep” 
or “central” a point is with respect to a particular distribution or sample (Liu et al, 1999). 
Well-known examples of such measures in the data depth literature include half-space 
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depth (Hodges (1955) , Tukey (1975), sometimes referred to as Tukey depth), eonvex 
hull peeling depth (Barnett, 1976), and simplieial depth (Liu, 1990). Data depths 
provide a natural eenter-outward ordering of points in a multivariate sample; onee 
ordered, various univariate tests for ehange may be applied with respeet to the ordering. 

Frieker and Chang (2009) propose a sequential ehange-point test whieh 
uses an available history of multivariate observations eombined with the k most reeent 
observations (where k is an adjustable window parameter) to generate a nonparametrie 
running estimate of the underlying density distribution. The history is assumed to be in 
eontrol; that is, the historieal observations are all drawn from the null distribution with no 
ehange point. With eaeh new observation they eompute a new density estimate and then 
perform a Kolmogorov-Smirnov test to identify whether the density heights of the data of 
interest are uniformly distributed among the density heights of the historieal data. 

b. Graph-Theoretic Approaches and Matching 

An intriguing approaeh to the ehange-point problem involves applying 
graph-theoretie ideas. In partieular, methods based on the graph-theoretie eoneept of 
matehing have gained interest in reeent years, in no small part due to inereases in 
eomputational capaeity. The test statisties we propose in this work are all matehing- 
based; therefore, we eonelude this ehapter by providing neeessary graph theory 
definitions and baekground to develop our ideas and reviewing graph-theoretie 
approaehes introdueed by Friedman and Rafsky (1979) and Rosenbaum (2005) whieh 
have inspired our work. 

(1) Definitions. The definitions in this seetion are from Chartrand 
and Zhang (2005). A graph G eonsists of a finite nonempty set V of elements ealled 
vertices and a set E of two-element (unordered) subsets of V ealled edges, in whieh ease 
we write G = (y,£’). A graph Gj=(Vj,£’j) is ealled a subgraph of G = (V,Ej if 

Vj c y and E^ ^ E; if V^ =V then G^ is ealled a spanning subgraph of G . We denote 
the edge joining vertiees u and v by {m,v} . Two distinet vertiees are adjacent vertices 
if they are joined by an edge, and two distinet edges are adjacent edges if they share a 
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vertex. Vertex u and edge {m,v} are said to be incident with each other, and the degree 
of vertex u is the number of edges incident with u . 


A u — v walk in graph G is a sequence of vertices in G beginning 
with u and ending with v such that consecutive vertices in the sequence are adjacent; if 
u = v then the walk is closed. A u—v trail is a walk in which no edge is traversed more 
than once; a circuit is a closed trail including at least three distinct vertices. A circuit 
that repeats no vertex except for the first and last is a cycle. If there exists a u — v walk 
for every pair of vertices u and v in graph G then G is said to be connected. 

A graph G is called acyclic if it has no cycles, a tree is an acyclic 
connected graph, and a spanning tree of G is a spanning subgraph of G that is a tree. 
If a real number is assigned to each edge of a graph, then the graph is a weighted graph 
and the sum of the all the edge weights is called the weight of the graph. A spanning tree 
of weighted graph G whose weight is smallest among all spanning trees of G is called a 
minimum spanning tree (MST). 

Friedman and Rafsky (1979) consider various statistics based upon 
MSTs in order to test whether two samples are drawn from the same distribution. Given 
sets of observations Aj and Aj as above, they construct a MST with respect to some cost 

function on the sample space. One change-point detection method they consider consists 
of removing each edge of the MST that connects a point in Aj to a point in , and then 

defining a test statistic that counts the number of disjoint subtrees that result from the 
edge removal. The resulting test is effectively a multivariate runs test, which corresponds 
exactly to the Wald-Wolfowitz runs test for the univariate case. The Wald-Wolfowitz 
runs test is known to be not particularly powerful (Connover, 1999, p. 3). Friedman and 
Rafsky (1999) demonstrate that their multivariate runs test has high power in higher 
dimensions, and they enhance test power by computing their test statistic on a collection 
of orthogonal MSTs, where two MSTs are orthogonal if they have no edges in common. 
We will use a similar idea to extend our main results for improved power. 
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We require a few final definitions to develop our main results. A 
subset of edges E' is independent if no two edges in E' are adjacent. A matching 
in a graph G = {V,E^ is an independent set of edges in G. A maximum matching in G 

is a matching that consists of at least as many edges as any other possible matching in G. 
For the remainder of this paper, all matchings under consideration are maximum 
matchings (that is, we are interested only in matchings that pair together as many vertices 
as possible). Finally, a perfect matching in G is a matching that includes every vertex of 
G. A perfect matching is necessarily a maximum matching; furthermore, a perfect 
matching is possible only on graphs with an even number of vertices. 

(2) Bipartite and Non-Bipartite Matching. A variety of problems 
can be framed as matching problems, where a matching is sought that minimizes some 
cost (Ahuja et al, 1993, p. 9). Two specific cases are bipartite matching problems, 
where graph vertices are divided into two distinct subsets Aj and Aj and each edge 

consists of one vertex each from Aj and Aj, and non-bipartite matching problems, 

where the matching does not depend on a partition of the vertices (that is, any two 
vertices may be paired in the matching). 

In our case, we are interested in matchings of minimum cost, 
where a cost function is defined as follows: Given sample space S , c:S xS ^ [0,cxe) 
is a cost function if it satisfies 

(2.10) c[x,x) = 0 Vx G S 

and 

(2.11) c(x,y) = c(y,x) Vx,yGS , 

We use c.j to denote the cost and also sometimes use the common 

terminology that c^j is the weight of the edge joining and Xj. We will call cost 
function c a distance function if it satisfies the triangle inequality 

(2.12) c[x,z)<c[x,y)+c[y,z) Vx,y,zES 
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in addition to (2.10) and (2.11) . In general, this framework allows broad flexibility to 
aeeommodate all types of data (diserete or eontinuous, numerie or eategorieal, ete.). In 
the change-point setting, a cost function should be tailored in some sensible way to the 
data types of interest and its selection ultimately does matter in detecting departures from 
the null hypothesis. 

We formulate the minimum-weight bipartite matching problem in 
the following manner. Given sample space S , two distinct sets of observations 
Aj = {Xj,...,X^}c S and Aj = S , = m-f n , and cost function 

c, a minimum-weight bipartite matching is a solution to the problem 

m N 

Z Z 

1=1 j=m+\ 

<1 V/e{l,...,m}, 

j=m+\ 
m 

Z^o-1 Vje{m + 1,...,A^}, 

i=l 

m N 

Z Z 

/=1 j=m+\ 

e{0,l} V/e{l,...,m}, Vje{m + 1,...,A^}, 

where =1 indicates that edge is in the matching and x^j =0 otherwise. In 

the graph underlying this problem, each element of Aj is joined by an edge to each 

element of Aj; no edges join elements within Aj or within Aj. The solution to this 

problem is not necessarily unique. In operations research literature, this problem is a 
particular instance of the “general assignment problem” (Ahuja et al, 1993, pp. 639- 
640). Software algorithms to solve this problem include that of Jonker and Volgenant 
(1987) and are widely available. 

Alternatively, we formulate the minimum-weight non-bipartite 
matching problem as follows. Given sample space S , a single set of observations 
A ={X„...,X„}C S , and cost function c , a minimum weight non-bipartite matching 
is a solution to the problem 


minimize 

X 

subject to 

(2.13) 
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N-l N 

Z Z ^u^ij 

i=\ j=i+\ 

i=l j=k+l 

N-\ N 

1=1 ;=i+i 

^,.£{0,1} Vje{/ + l,...,A^},V/e{l,...,A^-l}. 

where [jJ is the smallest integer less than or equal to y and x-j indicates whether 

i < j, is in the matching as in the bipartite matching case. In this case, every 

pair of elements in A is joined by an edge and the underlying graph is referred to as a 
complete graph. The solution to this problem is a matching that consists o^ N! 2 edges 
with every vertex included in the matching if N is even, or (A/^-l)/2 edges with every 

vertex but one included in the matching if N is odd. As in the bipartite matching case, 
the solution to the non-bipartite matching problem is not necessarily unique. Algorithms 
by Edmonds (1965) and Derigs (1988) solve the minimum-weight non-bipartite matching 

problem on a complete graph in time. Several improvements to Edmond’s 

algorithm have been developed over the years (Gabow, 1973; Galil et al, 1986; Gabow 
et al., 1989; Cook and Rohe, 1999; Mehlhom and Schafer, 2002; Kolmogorov, 2009). 
While these improvements do not improve theoretical runtime on a complete graph, they 
have been shown to improve realized runtimes in many practical instances. Edmond’s 
original algorithm to solve the non-bipartite weighted matching problem is sometimes 
called Edmond’s “blossom” algorithm, where the term “blossom” refers to subgraphs 
with particular properties that emerge during execution of the algorithm. Subsequent 
improvements often carry the “blossom” moniker; the latest is Kolmogorov’s “Blossom 
V” (2009). In Chapter IV, we elaborate on computational details specific to our 
implementation of non-bipartite matching algorithms for this study. 

(3) Cost Eunctions. In both the bipartite and non-bipartite cases, 
the assignment of pairs in a matching depends upon the choice of a cost function. Eor the 
problem of testing for data homogeneity with respect to some ordering, we will use cost 
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minimize 

X 

subject to 

(2.14) 



functions that are reasonable dissimilarity measures. Some applications will invite a 
natural choice of dissimilarity measure; for other applications this choice may require 
some deliberation. In our simulation study we consider four different distance functions 
on the sample spaee S = R"'. The first is Euclidean distance (ED), which yields 


(2.15) 


ED 

= 




1/2 


One disadvantage of ED is that it does not take into account 
measurement scale or correlation among data components. To address these issues, 
Mahalanobis distance (MD) is often used as an alternative, whieh is seale-invariant and 
aeeounts for data correlation. The resulting cost function is 


(2.16) 


MD 

= 




nl/2 


where V is an estimate of the eovarianee matrix assoeiated with the data of interest. 
Estimating V by the sample eovarianee matrix is sensitive to outliers, however, so we are 
also interested in distance measures that are robust to such outliers. Wang and Raferty 
(2002) provide a useful method to downweight outliers, called nearest-neighbor varianee 
estimation (NNVE) that measures how outlying a data point is by the standardized 
distanee between the point and its k*’’ nearest neighbor (where k is an adjustable 
parameter). Therefore, we consider a third distance function, which we will refer to as 
“Mahalanobis distanee, robust” (MD-R), which is simply the MD eomputed with 
reference to the NNVE covariance estimate V^nve omit any k subscript to simplify 


notation here): 
(2.17) 


MD 

‘■/j = 


(x,-x,)'c„(x,-x,) 


nl/2 


Another way to reduce sensitivity to outliers is to eonsider a 
distance measure based on the idea of multivariate ranks. We adopt a useful extension of 
rank to higher dimension given by Barnett (1976) and Chaudhuri (1996), described as 
follows by Choi and Marden (1997). Consider a d -dimensional inner product space S 
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and observations G S . For d = \, denote the rank of observation by r^‘\ 

assigning midranks in the ease of ties. Center and seale these ranks by the transformation 

0 _ u _ 1 

(2.18) R{X,)= ^^ 

n 

whieh maps |,..., | into the open interval (— 1, l). Then (2.18) ean be written 


(2.19) 



where sgn is the signum funetion 


( 2 . 20 ) 


sgn(v) 


0 

X 

Ul 


if V = 0; 
otherwise. 


Now for ^ X ., the summand in (2.19) ean be expressed as 


( 2 . 21 ) 


sgn 


(v-v 


X, - X 




Then a very natural extension of this eentered and sealed ranking transformation to the 
ease J > 1 is obtained by defining 


( 2 . 22 ) 




X.. - X. 


n— X-X. 


where ■ may be any norm in general; unless otherwise speeified we will use 


X = yJx'X VX G S. We will use the term “multivariate rank distanee” (RD) to refer 


to the Euelidean distanee between multivariate ranks as our fourth and final distanee 
option: 


(2.23) 


RD 

Cv = 


(s(x,)-s(x,))'(/;(x,)-/;(x,)) 


nl/2 


In Chapter IV, we examine the impaet these different distanee 
funetions have on our ability to deteet ehange with the new ehange-point deteetion 
methods presented in the next ehapter. 
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(4) Matching Examples. We now illustrate these matching 
concepts with a few simple examples. Figures 1, 2, and 3 show the same 20 
observations, each drawn from a bivariate normal distribution, with Euclidean distance as 
the eost function. The coordinates for these data are listed in Appendix D. In Figure 1, 
the observations are randomly partitioned into two subsets containing 8 and 12 
observations indieated by group labels ‘ □ ’ and ‘ respeetively; the resulting minimum- 
weight bipartite matehing consists of 8 pairs, with 4 observations from the larger set 
remaining unmatched. A quite different matching results when the observations are 
partitioned in a different way, as shown in Figure 2 with two subsets of equal size. 
Finally, Figure 3 demonstrates the minimum-weight non-bipartite matching on these 
same 20 points with no prior partitioning. These examples reiterate two fundamental 
differences between bipartite and non-bipartite matehings. First, the non-bipartite 
matching is not associated with any prior partitioning of the observation set, while the 
bipartite matehing depends greatly on how the observation set is partitioned. Indeed, for 
an even number of observations one may think of the minimum-weight non-bipartite 
matching as the lowest eost matching among all minimum-weight bipartite matchings 
computed for all possible partitions of equal size. Seeond, in the ease of non-bipartite 
matching for an even number of observations, all observations are paired with another 
(all but one are paired if the number of observations is odd), while in the bipartite 
matching case some observations are necessarily left unmatched in the larger of the 
partition sets. 
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Minimum-weight bipartite matching on 20 points; m=S, n=12. 



Minimum-weight bipartite matching on 20 points; m=10, n=10. 
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Figure 3, Minimum-weight non-bipartite matching on 20 points. 

(5) The Cross-Match Statistic. With these ideas in place, we 
conclude this chapter by reviewing a recent non-bipartite matching-based approach to 
change-point detection by Rosenbaum (2005). Rosenbaum presents an exact 
distribution-free test to detect change in multivariate data using non-bipartite matching. 
We explain his test in a bit more detail than the previous tests, as his ideas have 
substantially motivated our thinking on this problem. In fact, one of our results extends 
his single non-sequential test to a simultaneous test. 

Consider the case where Aj = {Xj,...,X^} is the set of the first m 
observations and Aj = is the set of the last n = N — m observations. 

The goal is to test for equality of distributions for the populations associated with Aj and 
Aj. Compute an optimal non-bipartite matching on Aj U Aj and let M*' denote the 
number of pairs cross-matched between Aj and A 2 . Rosenbaum (2005) calls 
(which he denotes “ A, ”) the cross-match statistic and derives its exact null distribution. 
For the case where N is even, this distribution is given by 
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(2.24) 


piM"^ =k] = 


2'‘ {N/2)l 


m — k 
2 


\k\ 


n — k 
2 


where k takes even values from 0 to min(m,n) if m and n are both even, k takes odd 

values from 1 to mm(m,n) if m and n are both odd, and p[m^ = kj = 0 for all other k. 

The ease where N is odd may be accommodated by introducing a pseudo-observation 
such that c(X;,X^^j) = 0 V/, computing a non-bipartite matching on the pooled 

sample , and discarding the pair that includes the pseudo-observation. 

Equation (2.24) is then adjusted to refer to the remaining N— I observations and 
[N — \)/ 2 pairs, conditioned on the set membership of the observation with which the 
pseudo-observation is paired. The distribution for odd N becomes 

I P\M = k| is paired with an element of Aj | 

X E jx^^j is paired with an element of Aj | 

\P[M‘^ =k\ X^_|_j is paired with an element of Aj 

+ 


p[M^ =k] = 


XL? V-VJ. VVXXXX 14.XX V'XV/XXXV'XXX / %2 

X E jx^^j is paired with an element of Aj} 


2"((X-l)/2)! 


N-\ 
m —1 


m —k — 1 


!k! 


n — k 


^■7 ({m — k = lmod2}) 


2'((X-l)/2)! 


-h- 

'x-T 

m — k 

!k! 

n —k —ll 




2 J 

[ 2 J 



7 ({m — k = 0 mod 2}) 


(2.25) 


2’^ [{N-\)I2)\ 


7({m —k = lmod2}j 7({m —k = 0mod2}j 


(N) 

k! 


m —k —l) 

in —k 

1 

f 1 \ 

m — k , 

in—k—I 

m 

V 

[ 2 J' 

i 2 J 


[ 2 J' 

2 


where 7 (■) is the indicator function and the factorial terms that depend on m or n are 
only computed when the factorial argument is an integer (since otherwise the indicator 
function in the numerator is zero). In this case, k takes values from 0 to mm(m,n). In 
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any case, the null hypothesis that the distributions underlying Aj and Aj are equal is 

rejeeted for small values of , sinee different underlying distributions lead to a 
preferenee for within-group matching over cross-group matching. 

While (2.24) and (2.25) are exact probabilities, in practice these 
values ean be difficult to compute for large N. Rosenbaum proves that under the null 
hypothesis, the conditional distribution of M'' given m converges in distribution to the 
normal distribution iox ml N p (constant); that is, 

(2.26) - ^ -^A^(0,1), 


where 

(2.27) 


A. 


E\M^ 


mn 


N-\ 


and erf 


Var[M^' 


2m(m —l)n(n —l) 
(A,_3)(N-1)» 


for even N. This enables application of the cross-mateh test to oases where N is large. 

We provide a small example to illustrate this test: Figure 4 shows 
the familiar data cloud from previous figures; in this case the data are assooiated with two 
subgroups of equal size. Group labels ‘ □ ’ and ‘ ’ are assigned at random to model two 
groups of 8 and 12 points, respectively, that are drawn from a common distribution. 
Cross-matches are ciroled; for this case Rosenbaum’s cross-match statistic takes the value 
=4 and has an assooiated p-value of p{m^ < 4j = 0.48 . Clearly no significant 
evidence exists to infer that the distributions underlying the two groups are different. 
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Figure 4, 


Figure 5. 



Rosenbaum’s cross-match statistic with no change point. 



Rosenbaum’s cross-match statistic with change point. 
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On the other hand, Figure 5 shows the same seatter plot, where this 
time one group eonsists of the lower-left hand observations (‘ * ’) and the other group 
eonsists of the upper-right hand observations (‘ □ ’) . Again, cross-matehes are eireled, 
and in this ease M*' = 2 with an assoeiatedp-value of p[m'^ — ~ p-value 

is certainly stronger evidence that the two groups have different underlying distributions; 
for this small sample size the smallest achievable p-value is p[m^ = Oj = 0.0014. 

Other examples of optimal non-bipartite matching techniques 
applied to statistical problems are found in Lu et al. (2001), Lu and Rosenbaum (2004), 
and Greevy et al. (2004). In an observational study of a media campaign against drug 
abuse, Lu et al. (2001) use optimal non-bipartite matching to pair teen subjects in such a 
way that each pair is demographically similar but has markedly different exposure to the 
media campaign. The evaluation compares stated intentions related to illegal drugs 
among comparable teens to assess the effectiveness of the campaign. Lu and Rosenbaum 
(2004) transform a tripartite problem of comparing one test group to two control groups 
into a non-bipartite matching problem in order to evaluate whether or not a localized 
minimum wage increase could be associated with depressed low-wage employment in 
that area. Greevy et al. (2004) demonstrate that optimal non-bipartite matching leads to 
improved covariate balance over randomized block design in a study of a cardiac 
function treatment for child cancer survivors. 

The tests we propose belong to the small but growing category of 
graph-theoretic tests for homogeneity that involve minimizing sums of interpoint costs on 
graphs; this category includes Friedman and Rafsky’s (1979) MST test and Rosenbaum’s 
(2005) cross-match test. The null hypothesis of homogeneity implies that the structure 
of these graphs is indifferent to group labeling, which permits the derivation of null 
distributions by straightforward arguments. We will apply this principle to derive null 
distribution results for our tests. 
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In summary, much room remains for innovative work in the field 
of multivariate, nonparametrie ehange-point detection. Particularly, very few change- 
point tests exist with the properties of being multivariate, simultaneous, and distribution- 
free. We proeeed now to present tests with exaetly these properties. 
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III. THEORETICAL RESULTS 


Suppose that we have an even number N = 2n> A observations {Xj,..., } 

ordered with respeet to time (or any other variable) and we want to test whether the 
observations are ehanging with respeet to this ordering. For example, we might want to 
test for a jump or a drift beginning at some unknown point in the sequenee. The 
observations may be multivariate, but are assumed to be independent. The requirement 
that N be even is not strietly neeessary, but it does simplify the exposition (we explain 
later how our results extend to odd sample sizes) . 

For a non-sequential simultaneous test where denotes the distribution of the 
i * sample value, the null hypothesis of homogeneity asserts that = F^ =■•• = Fj^ 
without speeifying the eommon distribution. The alternative hypothesis asserts that there 
exists an integer r, l<To<r<Tj<A — 1, sueh that Fj = F) = '' ’ = K-\ ’ K-\ ^ K ’ 

and h (Fj , F). j — max^^^^ F (F). , Fj. j is strietly positive over j G {r,..., A} , usually with 

Tq = 1 and Tj = A — 1 as diseussed previously. With respeet to a given eost funetion we 
eompute an optimal non-bipartite matehing M Xjjj i ’^;2 }} ’ 

R^,R 2 ,...,R 2 „ denote the sequenee labels assoeiated with eaeh observation sueh that if 
** ^i^ted pair, then F 2 ,_i = 72 ,_i and ^ 2 ,. = . Finally, order eaeh 

individual pair as {u , where t/; and Y. are the minimum and maximum ranks of the 
ordering variable respeetively: 

(3.1) t/,. = min|F 2 ;_j,F 2 ;} and = max|F 2 ,_j,F 2 ,}, /e{l,...,n}. 

With this setup in plaee, we are now ready to propose new ehange-point tests based on 
the ordered rank pairs assoeiated with an optimal non-bipartite matehing. 
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A. 


THE SUM OF PAIR-MAXIMA TEST 


1. The Statistic 

Under the alternative hypothesis that some distribution change has occurred 
within a sequence of observations, one expects an optimal non-bipartite matching to pair 
observations that are closer together in sequence than would be the case under the null 
hypothesis. This suggests summing the differences Y. -t/, across all pairs and rejecting 
the null hypothesis if the sum is less than some critical value. We consider an equivalent 
test statistic based on its relationship to this sum: 

( 3 . 2 ) r.=Zr=U(2«+i)+^Zh-t'T 

i=\ ^ ^ i=l 

We call the Sum of Pair-Maxima (SPM) test statistic, with rejection of the null 
hypothesis indicated by small values of this sum. We proceed to derive the mean and 
variance of , and show that has a limiting normal distribution by invoking a central 
limit theorem result attributed to Stein (1986). 

2. Expected Value and Variance 

A sequence (Zj,...,Z^) of random variables is said to be exchangeable if for any 
permutation n of indices {l,.... A:}, the joint probability distributions of (Zj,...,Zj.) and 

|z^^jj,...,Z^j^jj are identical (Fristedt and Gray, 2004). Under the null hypothesis, each 

of the N\ possible assignments of sequence labels is equally likely and the random 
variables (T|,...,T„) are exchangeable. To obtain expressions for the expected value and 
variance of we apply equations (3.3)-(3.5), which are derived in Appendix A: 

(3.3) P{f=t)= t = 2,...,2n, 

n(2n -1) 
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(5-l)(n-l)(2n-3) 

t-3 

(n-l)(2n-3) ’ 


, t = 2,...,s-\ 


t = s + \,...,2n , 




_ 2(2n + l) 


E[Y,Y,] = 
Var(yi) = 


(2n + l)(3n + l) 

3 ’ 

8(2n + l)(5n + 2) 
45 ’ 

(2n + l)(n-l) 


Coy{Y„Y,) = - 


4(2n + l) 


The following are now immediate: 

r 1 r 1 2n(2n + l) 

M.=E[T,] = nE[Y,] = ^ - 

(3.6) o-^ =Var[r^] = nVar[yi] + n(n-l)Cov[yi,T2] 

n(n-l)(2n + l) 

~ 45 ■ 

3. Using Stein’s Method to Establish Asymptotic Normality 

One important result of our researeh is the establishment of a eentral limit 
theorem for , namely that 


P\ ^ asA 


for every -go <t <co , where 4> is the standard normal eumulative distribution funetion. 

n 

While the Y2s in the sum = '^Y. do have identical marginal distributions, they are 
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not independent so the elassieal eentral limit theorem is not applieable here. Instead, we 
prove (3.7) by a teehnique referred to as Stein’s method. Stein’s method (Stein, 1972, 
1986) is based on a simple differential equation that characterizes the normal distribution, 
and an idea called “coupling,” which involves the construction of auxiliary random 
variables that are “close” to the variables under investigation. This method establishes 
bounds on the distance from normality for certain cases of dependence, including our 
case. 


We exploit the combinatorial structure of to construct an exchangeable 
coupling that allows us to invoke Stein’s results. Let /cr^. On the same 

probability space on which is defined we construct a random variable “close” to , 
which we denote , by selecting distinct integers u and v, w < v, at random from 
l,2,...,n (with n>3), switching R 2 uand and taking the sum of the modified pair 
maxima. For to be an exchangeable pair simply means that 

p{t^ =t,fn = ^) = ^(^« = pTn = for ^ ^rid t . This symmetry may be shown as 
follows. Let 

(3-8) j;(“.'')=Ei', + Er + Er 

(=1 /=M + 1 (=V+1 

be the sum of pair-maxima for all pairs in a matching except for the and v**' pair, 
where denotes the pair-maxima of the pair as before. Therefore, 

= To (w, v) + max ( R ^^,, J + max ( R ^,,, ), 


for any choice of u and v . Conditioning on Tg (m,v) gives 


(3.10) 

^ h=‘.t,=0=E E = >.t ='To (“.>') 

( m , v ) tQ{u,v'j 


C(M,v))p(ro(M,v) 


^0 




1 
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Now let (m,v) < ^2 (m,v) < ^3 (m,v) < ^4 (m,v) be the ordered values of ^2«-i ’ ^2 h ’ 
i? 2 v_i, and i? 2 y • Then, conditional on Tg (w, v), the only values and can assume are 
hj (m,v) = Tq (m,v) + ^2 (m,v) + ^4 (m, v) and hj (m,v) = Tq (M,v) + a 3 (m,v) + ^4 (w, v). It 
follows directly that 

p(T^=b^[u,v),ff^ =hi(M,v)|ro(M,v)) = 0; 

P(r^ =h 2 (M,v), 7 ; =hi(M,v)|ro(M,v)) = ^; 

(3.11) ^ . 

p{TN=b,{u,v),T^ =b^{u,v)\T^{u,v)"^ = --, 

p{Tj, =b^{u,v),f^ =b^{u,v)\T^{u,v)^ = ^-, 

so the joint conditional probability distribution P^r^,7^y|ro (w,v)j is symmetric. 
Therefore, by (3.10), p{T^=t,f^=t) = P{T^=t,f^=t) and so (7)v,7iv) is an 

exchangeable pair. Now define =(7iv ~^n- the same manner is 

an exchangeable pair. 

Moreover, where = y„+};-y„-7^ with 

y„ =max(P 2 „_i,^ 2 v) and ); = max(P 2 „,i? 2 y-i) • Then for any /e{l,...,n}, 
E[Y.\W^] = n^T^ and 


(3.12) 


where 


^ryi^y 1 = Qn = ( 2 n + l)( 2 n-l) __ 

L-' 2n(n-l) 3(n-l) 2n(n-l) 


1 1 n jj^ 


2 n j-l 

= Z Z Z Z (^2j,-k ^ ) = Z Z (b 2 ) - Tn 

k =0 {=0 j ^=2 21=1 2=2 1=1 

2 n 

= 2HJ-i)-T, 


(3.13) 


2=2 


2 n( 2 n + l)( 2 n -l) 


-T 
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Combining these expressions gives 


(3.14) e[w„\W„] = (\-X,)W,, 

where 


(3.15) 


= 


2n-\ 


{n-\y 

From Theorem 2.5 of Rinott and Rotar (2000) we have 


(3.16) 


/I 


R(lT^<t)-0(t)<— Var £ (W,-W^f\W, 




W -W 

’'n ’'n 


whieh in the present ease reduees to 


(3.17) 




6-45 


3/4 


7/4 


£ A„, 




We now show that n "^VarjF'j^A^.^|VF^^|^0 and n j^O from whieh 

|r(W^ <t)-0(t)| ^0 will follow. Because the second condition easily follows from 
the fact that | A^.^ J < 2n-3 , we focus on the first condition. We cite two results that 
will be useful; 

Lemma 3-1 : Suppose that 3j and Sj are cr - fields with 3j c . Then for any 
random variable U that is measurable with respect to both 3j and 32 


(3.18) yav{E[U I 3i]} < Var{ e[U | 32 ]} . 

Proof of Lemma 3-1 : See Theorem 34.4 of Billingsley (1986), p. 470. Let 
V=E[U\^y, giving £[y | 3i] = £[t/| 3;] and Var{£[[/| 32 ]} = 

Var(y)= Var{£[y |3i]}+ £{Var[y|3i]} > Var{£[y | 3i]} = Var{£[t/| 3;]} . ■ 
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Lemma 3-2 : Let mand A^be positive integers with 2m<N, and let (;z-j,;z' 2 ) 
denote the first and last m elements of a random permutation of size 2m taken from the 
integers Then for any real-valued funetiong satisfying < oo , 

(3.19) Cov[g(;z-i),g(;z-2)] = -Cov,[g{7r,),g{7r^)](p-J,„-l), 

where CoVj[g(;z'j),g(;z' 2 )] refers to the eovarianee taken over randomly seleeted pairs 

(N -m)\(N -m)\ 

of m-permutations with at least one eommon element, and - -r^ -—^ is 

’ N\[N-2my. 

the probability that two randomly seleeted pairs of m-permutations have no element in 
eommon. 

Proof of Lemma 3-2 : Let £'Q[g(;rj)g(;r 2 )] = jU^ denote the expeeted value of 
the produet gi^n^^gi^n^) where the permutations and are ehosen independently 
from [\,2,...,N] , and £'[g(;z'i)] = //^. Then 

(3.20) E^[g[n,)g{n^)\ = p^,„E[g[n,)g{nyy\ + [\-p^,y)Ey[g[7r,)g[7ryy\, 


where (•) refers to expeetation taken over pairs of permutations that have at least one 
element in eommon, so 

£ (^1 ) ^ (^2 )] = (^'o [<? (^1 ) ^ (^2 )] - (l - PiV,m ) [<? (^1 ) ^ (^2 )]) 

= PN\m (Pg - (l - PN,m ) [<? (^1 ) ^ (^2 )]) • 


(3.21) 
Therefore 


(3.22) 


Cov[g(;z-i),g(;z-2)] = E[g{7r,) g{7r^)\- 

= pin. (Pg - (l - PN,m ) [<? (^1 ) ^ (^2 )]) ' 

= (E,[gMg{^2)]-Pg)(pim-^) 

= -CoVi[g(;ri),g(;z-2)](p^‘„-l) 


Pg 


as asserted. 
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By Lemma 3-1 it is sufficient to show that n‘’Var|£'[A^.„ J ^ 0 for 3^ = 
where 


(3.23) 


^[AkJ3^] = 2 


n r-l 

r=2 s=\ 


-l) 


Taking the variance yields 


Var{£[A^,J3^]} = 2 


(3.24) 


Var A^,j2^ Cov^A^.j 2,A^^3j 

n(n-l) n(n-l) 

Cov ^ A^^ 2 ? A^.3 4 ^ 


+ (n-2)(n-3)- 


[(n -l) 


Now use the fact that | A^.^^ |<2n-3 to show that the first two terms on the right in 


(3.24) go to zero when multiplied by 


.-4 


n 


Cov[a^„,j,aJ,„]<(2«-3)'(p;,,-1) 


where 


By Lemma 3-2, 


(3.25) 


(iV-4)(iV-5)(iV-6)(iV-7) 

iV(iV-l)(A^-2)(iV-3) 

8(2A^-7)(a^"-7A^ + 15) 
iV(iV-l)(iV-2)(iV-3) 


It follows that P{Wff < t) -0(t) ^0 as claimed. ■ 


4. An Improvement to the Normal Approximation by Edgeworth 
Expansion 

For small or moderate n, the error associated with the normal approximation may 
be unsatisfactorily large. This error can be reduced slightly by an Edgeworth expansion, 
which approximates the distribution of interest using the normal distribution plus higher 
order corrections that adjust for non-zero moments of third order and above. We derive 
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an Edgeworth expansion to inelude the skewness of , k^=E - ju^ 
Using eonditioning arguments as before, we have 

(2n + l)(24nVl5n + l) 


as follows: 


15 


(3.26) 


£[y=]: 

£rj,ij.l=42« + l)(l5«^ + 10« + l) 


45 


E[YJ,Y,] = 


16(2n + l)(70nV49n + 6) 


945 


See Appendix A for details. Now write 
(3.27) 


r»=Sr’+3Zi'A+Zrri'. 

i^j^k 


i=\ 


and apply exehangeability to obtain 

E\Tl~] = nE\Y^^ + l>n{n-\)E\Y,%~\+n{n-\){n-2)E[Y,Y^Y^] 


(2n + l)(24n" +15n + l) 4(2n + l)(l5n" +10n + l) 

- »’-2-2 _l_3^(^ _U-1- L 

15 ^ ’ 45 


= n 


(3.28) 


+ n(n-l)(n-2) 


16(2n + l)(70n"+49n + 6) 


945 


Therefore, 


(3.29) 


= (1 i20n" + 1204n' + 236n" -43n + 3). 

945 ^ ’ 


)' ] = ^[T’w ] - - mI 

-n(n -l)(2n + l)(2n + 3) 


945 


= /c,. 


Note that /r 3 < 0 Vn > 1, so is negatively skewed for all oases (sinoe by assumption 
we oonsider oases with at least two pairs). 

Now let Ej^ be the distribution funotion for the standardized random variable 
= (r^ - //^) / cr^, and let = e[z^ ] denote the oumulant of Z^ . In partioular. 
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(3.30) 


<^N 21.^n(n -l)(2n + 1) 


Then the Edgeworth expansion for with respeet to the standard normal distribution 
funetion O may be written 


(3.31) 


^iv (^) = ® ^) 

\26yf^ -l)(2n + l) 


(Wallaee, 1958). It is evident that the Edgeworth expansion (3.31) makes a positive 
eorreetion to the normal distribution outside one standard deviation from the mean, and a 
negative correction inside one standard deviation. Table 1 shows the critical values for 
N = 200 obtained by Edgeworth expansion compared to estimates obtained by 
simulation and normal approximation. 


a 

Simulation 

Edgeworth 

Normal 

0.001 

12746 

12749 

12750 

0.005 

12854 

12857 

12858 

0.01 

12908 

12910 

12911 

0.05 

13050 

13054 

13054 

0.1 

13128 

13130 

13131 


Table 1. Critical values of the statistic for A^=200 estimated by 10,000 

simulations, Edgeworth expansion, and normal approximation. 
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The improvement appears rather small, but taking the skewness of into aeeount in an 

Edgeworth expansion provides improvement nonetheless and reduees SPM test false 
alarm rates slightly relative to the normal approximation. Appendix B lists approximate 
critical values for various significance levels and sample sizes using (3.31). 

5. Treatment of Odd Sample Sizes 

If the sample size is odd, non-bipartite matching will leave one data point 
unassigned. Conceptually, it poses no difficulty to extend our results to this case as we 
now proceed to do. Let N denote the total sample size as before. For clarity we let 

and denote the sum of n matched-pair maxima in the even (N = 2n) and odd 
{N = 2n + \) cases respectively. Define a stochastic replica of as follows: 

1) Select integer u at random from the set {1,2,..., 2n +1}; 

2) If M<2n take + 

3) Otherwise take . 

Equivalently, + where is an independent Bernoulli 

random variable with success probability 2n/(2n + l). Using this expression, the 
expected value and variance are obtained: 

= £p,,] = eUo] + 

2n + i 5 

2n(2n + l) 2n 4n(n + l) 

(3.32) =—5^- ’- + — - ’- 

3 3 3 

(2n + 2\ 

- Mno ~ ~T ’ 
y 2n 4" 1 y 
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(3.33) 


<, = Var[f„,] = £[var[f„, | 5 ,]] + Var[£[f„, 

n(n-l)(2n + 7) 2n n(n + l)(2n + 3) 

^ 45 ^ 45 

2 (n + l)(2n + 3) 

(n-l)(2n + l) 

In standardized terms, the eorreetion made by adding an observation to the even ease 
beeomes negligible as the sample size inereases and does not affeet the asymptotie 
normality argument of the previous seetion. 

6. On the Consistency of 

Henze and Penrose (1999) establish that Friedman and Rafsky’s minimum 
spanning tree test is eonsistent against all alternatives for the two sample ease 

~G, H^ :F = G, H^ :F^G, F and G unknown. 

Rosenbaum (2005) argues for the eonsisteney of his eross-mateh statistie distributed 
as in (2.24) and (2.25) against alternatives of the form 
F ^G ,..., X^^^ by showing that it is a eonsistent test for eomparing 

two discrete distributions with finitely many mass points. He heuristieally extends that 
eonsisteney argument to the general ease by the faet that any two distributions may be 
approximated arbitrarily elosely by two diserete distributions with finitely many mass 
points. We do not try to formalize that argument here; rather we theoretieally motivate 
the use of the SPM test under alternative hypotheses by proving a eonsisteney result for 
that depends on the eonsisteney of : 

Proposition 3-1 : is eonsistent against all alternatives of the form 

(3.34) X„...,X^^,-F^G-X^,...,X^ 3m e {2,...,iV} 

against which is consistent, where is Rosenbaum’s eross-mateh statistie as 
defined in (2.24). 

We prove the ease for even N and a ehange point that divides the observations 
into two subsets eaeh eontaining an even number of elements; the same reasoning 
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extends to odd oases. Let N = 2n, suppose Mj +1 is a ohange point, Mj and 


= N — M^ are both even, and / N ^ oonstant as N ^ oo . Adopt the subsoript 

notation “0” or “1” to denote probabilities (or expeotations or varianoes) taken under the 
null or alternative hypothesis, respeotively. We will show that 




< 


(T, 


1 as A 


cxe 


for any signifioanoe level a , where denotes that 


a- 


quantile of the standard normal distribution. We build our proof on the following faet: 


Lemma 3-3 : Let denote the random variable obtained by matehing among 
the first Mj points alone (eall that set of pairs ), matehing among the last points 
alone (eall that set of pairs ), randomly ehoosing k pairs in and k pairs in /^ (k is 
even for this ease sinee and N are all even), and randomly swapping one 

element from eaeh seleeted pair in P^ with one element from eaeh selected pair in P^ 
(each pair gets one swap), and finally computing the pair-maxima sum on the new pairs. 
Let A; be the change in the pair-maxima sum due to the swap, i = \,...,k. Then 
under null or alternative hypotheses, 

(3.35) is distributed the same as conditional on = 2k 


and 

(3.36) 


^N\k 


T +T 

^ M. ' AA 





where k G |0,...,min(Mj,M2)/2}, the A; are exchangeable, and each A,, is 
independent oi . 

Proof of Lemma 3-3 : That is distributed the same as conditional on 

= 2k is a consequence of all within group matchings being equally likely and all 
cross-group matchings being equally likely. Each swap constitutes two cross-matches. 
MM * 

^N\k ~ -^^ + X] from the fact that by the construction of r^|j,, 

2 
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M 

before any swaps oeeur the total pair-maxima sum is -|- Mj , where is 

M 

the pair-maxima sum over and -|- Mj is the pair-maxima sum over . After 


the swaps the final pair-maxima sum inereases by a total ^ A; relative to the pre-swap 

1=1 

sum. Exehangeability and independenee are by eonstruetion of . ■ 

Proof of Proposition 3-1: By (3.36), 


^0 ~ ^0 




= E. 


(3.37) 


1 ^ 1 MjM, 

1 F T 1 ' 2 

e ,[ m ^' 


^ 2 


M * 

r„+7'„,+w,-^+E4 

^ ( = 1 


E, A, . 


=2k 


Now solve for E^ [Aj direetly by substitution using (2.27) and (3.6): 
(3.38) 


^0 A, 


^0 '^M, ^0 MjMj 


Eo 


(A^-1) 


2 (A (A^ +1) - Mj (Mj +1) - Mj (Mj A1)) - 3MjM2 
3MjM2 

2(A(A + l)-7rjA(7rjA + l)-(l-7rj)A((l-7rj)A + l))-37rj(l-7rj)A" 


37rj (l —TTj) 


(/v-i) 


N-l 


Alternately, Eq [Aj] ean be found more direetly by noting that with every swap the pair- 
maxima value from P^ gets replaeed by the pair-minima value from P^ . Denoting the 
pair minima of the first swapped pair in P^ as t/j.j and the pair maxima of the first 
swapped pair in P^ as Fj.j, we have 


(3.39) 


E A = E U —Y\ = 

^0 ^1 ^0 1:2 -' l ;! I 


Mj + 



2(Mj+l) A-1 

+ 3 

) 

3 3 
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Since equation (3.36) holds under both null and alternative hypotheses, we 
proeeed as above to find an expression for ]: 


E, 7 ; = E, 


E, T, 




M ^ 

^ /=1 


M^' = 2k 


(3.40) 

= E, 

j-,, 

+ ^o 

T.,\ 


= E, 


+ Eq 

T„,[ 


M.M, 

1 1 2 1 

e^[m^' 

2 

2 

M.M, 

1 1 2 1 

iN-1) 

1 1 

2 

. 6 , 


^0 A, 


so 


(3.41)£, T, -E, T, = 


N-1 


M 


c' 


-EAM^\\ = N 


'N-\] 

1^1 


-Eo 


. 6 , 

1 N 


Rosenbaum argues that 
suffieiently large N 


e\m^\-e^\m^ 

N 


6 <0 as N ^ 00 ; therefore, for 


C i2) ^'f4-£.[r>.] ,y(y-i)(g/2) Sn{n-i)6 




6cr, 


^iV(iV-2)(A^ + l) 


—00 as A /'—>00 


again using (3.6). Now we need one final lemma to eomplete our proof: 
Lemma 3-4 : 

VarJrJ 


(3.43) 


Varo 


0(1) 


where 0{ ■] is the standard Landau notation. 


Proof of Lemma 3-4 : We rely on two faets that follow direetly from 
Rosenbaum’s argument for the eonsisteney for his eross-mateh statistie; namely, 

E^ \m^] = 0[N) and Yaq \m^] = 0[N) . First, expand Vap [f^] by eonditioning: 


Var, 


(3.44) VarJr^] = Vaq[£,[r^ W\\ + E,\ 

Now express eaeh term on the right hand side using (3.36). The first term is simply 
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Var. 


E, T, M' 


= Var, 


= Varj 


M ^ 

r„,+r..+w,-^+EA 

^ ? = 1 


= 2k 


£.[r„,l + £.Kl+^+^‘5M 


= 0 V" Var; M 


(3.45) 


= 0[N^). 

For the second term, we first note that and are independent by construction, so 


Var; M 


= E, 


Van 


M ^ 

n.+n.+M.-^+^A, 

^ /=! 


M^' =2k 


(3.46) 


= Var^ I + VarJ I + Vap 


0 


E4 


M^' =2k 


+ Cov„ 




/=1 


M^' = 2k 


It is clear that the exchangeable s are negatively correlated, since A. = U -.2 — Y..^ with 
the U ..2 negatively correlated, the l^.j negatively correlated, and the t/^.j independent of 
the }).j’s. Therefore, 


Van 


EA 


=2k 


^Varo[A.|M^ =2/t] + ^Cov, 


(3.47) 


M 


c' 


Varg [Aj] + 


M^' (m^' 


A.,A. M^' =2k 

* J 


Covo Ai,A2 


< 


M 


ct 


Varg f/i.2-F„ =M^O[N"]. 


Furthermore, the A. s can be seen to be negatively correlated with and , since 
larger values are associated with larger FJ.j values and larger values are 
associated with smaller U ..2 values. So, 
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Var, 


<E, 


Varo T +Var, +Var, 


0 TA/, 


EA, 


M^' = 2k 


(3.48) =£i[6>(A^') + M^6>(A^' 

= o[n^]. 

Combining (3.45) and (3.48) results in Var[r^] = , and so 

Varj [r^ ] / Var^ [r^ ] = O (l) as claimed. 

Finally, to conclude the proof of our consistency proposition, we apply 
Chebyshev's inequality. Choose any real number 5 > 0 and any significance level a . 
Then 

\>p, (I?; [7;]| > [t;]) >p[t.-E, [T,] > [T, 


(3.49) 


= P 


>P^ 




T —F T 


> 



^Vaq 

\t 

[ N, 

+ ^i [Pn]~Eo [T’iv] 


'Var„lr„] 


> 


for sufficiently large N, 


since 


as 


VVar„[r,l 

Varj[r^]/Varj[r^] = 0(l) by Lemma 3-4 and (F'; [r^]-£'o [T^])/cr^ ^-oo 
N ^ oo by (3.42). The inequality (3.49) is true for any 5>0, so 

T — '' 

Fn 


■< Z. 


cr. 


\ asN ^ oo and the proposition holds. 


7. A Graphical Example 

We give a graphical example to illustrate how the Sum of Pair-Maxima test 
works. Consider the same set of 20 points drawn from a bivariate normal distribution 
used for illustration in Chapter II (see Appendix D for data values). Now suppose that 
associated with each observation is some sequence label, for example, the time order of 
the observation. Figures 6 and 7 show two such cases, where the plot symbol for each 
data point is its sequence label, and the data are paired with respect to the optimal non- 
bipartite matching on the set (compare to Figure 3 in the previous chapter, which is the 
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same plot except without sequence labels). To represent a sample whose underlying 
distribution has not changed with respect to sequence label, the sequence labels are 
assigned at random in Figure 6. To represent a sample whose underlying distribution has 
changed with respect to sequence label, the sequence labels are assigned from lower-left 
to upper-right in Figure 7. 



Figure 6, Minimum weight non-bipartite matching on 20 points with no 
change in underlying distribution with respect to observation order. 

Now we compute the SPM test statistic, , for each case and consider whether 
or not this test rejects the null hypothesis at significance level a = 0.05 . For N = 20 , 
the quantile table in Appendix B gives Tj™* = 129 . In the case of Figure 6, one readily 

computes =15 + 19 + 18 + ll + 20 + 17 + 8 + 5 + 14 + 16 = 143>129 = r 2 ™‘, and so 

the null hypothesis is not rejected. In the case of Figure 7, 
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=2 + 4 + 6 + 10 + ll + 13 + 17 + 16 + 18 + 20 = 117<129 = r 2 f, so the null 

hypothesis is rejected and we conclude that the underlying distribution is changing with 
respect to observation order. The nature of change in this example is perhaps extreme, 
but it serves to illustrate the sense of the statistic as a change-point test. 



Figure 7, Minimum weight non-bipartite matching on 20 points with a 
change in underlying distribution with respect to observation order. 

We more thoroughly examine the performance of the SPM test by simulation 
study in Chapter IV. As one might expect, while this nonparametric test has a fixed false 
alarm rate regardless of underlying distribution, its power is somewhat low. We find that 
we can dramatically increase the power of this test by considering a particular ensemble 
of such matching statistics, and still retain a fixed false alarm rate. Before we present the 
theory for such an ensemble statistic (in Section C of this chapter), we first introduce an 
alternative to the SPM test that is also based on non-bipartite matching. 
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B. 


THE NON-BIPARTITE ACCUMULATED PAIRS TEST 


1. The Statistic 


The cross-match test statistic proposed by Rosenbaum (2005) is used when there 
is a clear delineation between two groups in a sample (for example, treatment and 
control) and the objective is to test whether the two groups come from the same 
probability distribution. After performing an optimal non-bipartite matching, one counts 
the number of pairs that link across the two predetermined groups. Under the null 
hypothesis of homogeneity this test statistic has a simple, exact distribution (see (2.24) 
and (2.25)) that can be approximated by a normal distribution in large samples. We now 
consider an extension of Rosenbaum’s test to the situation where a sample is ordered 
sequentially but no prior subdivision of the observation set exists. 

As before, suppose we have N = 2n observations and an associated optimal non- 
bipartite matching M ={(t/j,}j),...,(t/„,y„)}. Let = ^{U.,Y.)\Y.<k;i = l,...,n^ 

denote the number of pairings in a non-bipartite matching that occur within the first k 
observations, 2<k<N-\. The cross-match statistic M*' is equivalent to ^, and so 
an exact expression for the probability mass function of ^ under the null hypothesis 
follows directly from (2.24): 



where P(^M^ = r^ = =k-2r^ for observation subsets of size k and N — k. 

Observe that ^ can be expressed in terms of the matched-pair maxima Y. by noting 
that the event ^ > r is identical to the event | j | Y. <k}| > r , which in turn is identical 
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to the event <k where Y^..^ is the 7 * largest among the matehed-pair maxima 
(taking = N + \ for r>n). Thus, large values ^ are assoeiated with small values 

of 7 and vice-versa. 

Rosenbaum uses the number of cross-matches as a test statistic, rejecting the null 
hypothesis of homogeneity for small values. The number of within-group matches in any 
one of the two groups, ^, is an equivalent test statistic, with large values indicating 

evidence against the null hypothesis. We call the vector the 

Non-Bipartite Accumulated Pairs (NAP) test statistic. Like 7^, a test based on 
rejects the null hypothesis for small values of Yj. 

2. Critical Envelope 

It is possible to develop an exact simultaneous test based on for cases where 
the change point k is not pre-specified. To do this we seek a vector of non-negative 

integers ^ at > • • • > w) (where we omit the test level subscript on the right-hand 

side for ease of notation below) so that the following is true for a given test level a : 

(3.51) P(M^^<q^^,k^<k<k^>\-a. 

We choose < 7 ^^ to be the \-a quantile of the distribution of so that the non- 
simultaneous test at stage k has level d; that is, 
P[Mi^ for some kg < k < kj) < « . The problem, then, is how to select d so that 

the simultaneous test level comes as close to a as possible without exceeding this value. 

To find P[M^ ^ kg < k < kj), we develop a recursive computational 

scheme based on the fact that 

%.N 

(3.52) P(M„ t, <t <*:,) = £ n(r,k„NYP[M,^ „=r). 
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where 7r[r,k,N) = P[M < 7 < A:-11 ^ , and that 7r{^r-,k,N) has a 

recursive form stated in the following lemma. 

Lemma 3-5 : 

;z- ( r; ) = ^ ;z-(r - 1; A: -1, ) 7 ( r - 1 < <7 ^ j ^ ) 

(3.53) +^^^;z-(r;Ar-l,A^)/(r<^,_i^), 

k = kQ+\,...,k^;r = Ov{k-n),...,q^^ , 

where 7 (■) is the indicator function. 

Proof of Lemma 3-5 : Expand /r(r;k,N) as follows: 

(3.54) 

7i[r;k,N) 

[(k-iy2\ 

s=0v(k-l-N) 

l{k-l)/2\ 

s=0v(k-l-N) 

[(k-iy2\ 

= ^ 7r{s;k-l,N)-P(M,_,^^=s\M^,,=r)-l(s<q,_,,,) 

s=0v{k-l-N) 

= 7T{r-\;k-\,N)-P{M,_^^=r-\\M^^=r)-l[r-\<q^_^^) 

+ 7r{r;k-l,N)-P(M,_,^^=r\Mi^^^=r)-l(r<q,_^^^) 
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Finally, we note that 

(3.55) 

P(M,_,,,=r-\\M,^^=r) 

N 

= r-l,M^ ^ = r I item k matched to -P (item k matched to s) 

_ s=\ _ 

P(M,,„=r) 

k-l 

= r -11 item k matched to • P (item k matched to 5 ) 

_ _ 

P(M,,„=r) 

1 P{^ k-2.H-2 ~ ^ _2r 

P{M,„=r) 
to complete the proof. ■ 

To start the recursion take 7i[r;kQ,N) = \, Vr. Let 

(3.56) b{r;k,N) = 7i{r;k,N)l[r<qi^^) 
and 

(3.57) Ai^(^r;k,N) = b(^r;k,N)-b(^r -\;k,N) 

From (3.53) we then have 

(3.58) 7r[r;k,N) =b[r;k - \,N)-^A^(^r;k -1,N) 

which is suitable for efficient implementation in S-PLUS® (2005), R (2005), 
MATLAB® (2008), or other interpreted languages. There is no need for asymptotic 
approximations; finding an exact critical region can be done quickly at any practical 
sample size using trial and error. An implementation for R is included in Appendix C. 
We use the fact here that all information carried through conditioning to the joint events 
Mk_iN = ^ is carried through the single event = s, and that the event 

Mkf^=r implies either ^ = r or ^ = r -1. 
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Using the exact method presented here is far less conservative than the Bonferroni 
method. At sample size A = 20 a nominal a = .05 simultaneous test achieves test level 
.046 using a = .021 while the Bonferroni test achieves level .0044 using a = .0028 (.05 
divided by 18 based on =2 and = 19). Sample size N = 100 achieves level .048 
using d = .0046 while the Bonferroni test achieves level .006 using d = .0005 (.05 
divided by 98 based on = 2 and k^ = 99). 

As an illustration, Figure 8 shows two cases for A = 100 at significance level 
a = 0.05 . The solid line is the critical envelope ^ obtained by recursion using (3.51)- 
(3.58). The null hypothesis case (homogeneity) is modeled by drawing all 100 points 

from a bivariate normal distribution BVN(pq,S) where Po = (0,0) and S is the 
identity matrix. The alternate hypothesis case (heterogeneity) is modeled by drawing the 
first 50 points from BVN(Pq,S) and the last 50 points from BVN(qj,S) where 

Pj = (3,0) . We choose a large mean jump for this example to emphasize the response of 
to a change point. Applying the NAP test, we reject the null hypothesis for any case 
where ^ ^. In this example, for the case of homogeneity never exceeds 

and so we do not reject the null hypothesis. In contrast, for the case of 
heterogeneity exceeds q^^ not just once but for numerous values of k, so ample 
evidence exists to reject the null hypothesis. 
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Critical envelope and example statistics (A/= 100, a= 0.05) 



Figure 8, Critical envelope for the NAP test and two cases of with 
A = 200 and a = 0.05, Reject the null hypothesis if exceeds 
for some k. 

3, A Graphical Example 

As we did for the SPM test, we illustrate the mechanics of the NAP test with the 
data presented in Figures 6 and 7. We compute the NAP test statistic for 

k = 2,...,\9 and consider whether or not this test rejects the null hypothesis at 
significance level a = 0.05 . Figure 9 shows critical envelope and for the data 
from Figure 6. We find the values easily by visual inspection; there are no pairings 

that occur within the first 2 observations, none in the first 3 observations, none in the first 
4 observations, 1 pairing within the first 5 observations, and so on. 
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k 


Figure 9, NAP test statistic for Figure 6 data, no change point detected. 

Since ^ 20 ^ 20 this case, we do not reject the null hypothesis. In 

contrast, Figure 10 shows q 2 o„ and test statistic Mjg for the data from Figure 7. Here 
we see that 20 > 20 tor ^ = 4,6, and 11. A single exceedance alone is sufficient to 

reject the null hypothesis, so in this case there is more than enough evidence to do so. 
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Figure 10, NAP test statistic for Figure 7 data, change point detected. 

We note here at least two potential advantages of the NAP test over the SPM test. 
First, by design, the NAP test allows one to narrow the window of the test for possible 
ehange points, which the SPM test does not allow. In other words, if there is prior 
information that allows a possible change point to be restricted to a sub interval [^o’^i] 
with kQ>2 and < N, then the critical envelope calculation is adjusted accordingly. A 

second advantage is that the NAP test gives information regarding not only if a change 
point exists, but also when it occurred. For any case where at least one exceedance 
exists, let k* = min ^ We expect that earlier change points would be 

identified by smaller values of k*. We examine the performance of the NAP test in the 
simulation study presented in Chapter IV. 
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c. 


THE ENSEMBLE SUM OF PAIR-MAXIMA TEST 


The minimum cost assignment obtained by optimal non-bipartite weighted 
matching is associated with a random sample; therefore, the assignment is optimal only to 
the specific data at hand. Another sample with the same underlying distribution(s) would 
almost certainly result in a different matching with respect to sequence labels. It is 
natural then to examine sub-optimal (but good) matchings for additional information 
regarding homogeneity, and evaluate whether the information in this ensemble of 
matchings yields greater power to detect whether a distribution change has occurred. In 
particular, we consider collections of matchings that are orthogonal, meaning they share 
no common pair (this is similar to the approach of Friedman and Rafsky (1979) where 
they examine orthogonal minimal spanning trees). We discuss a few properties of such 
collections as background and then introduce a test statistic based on collections of 
orthogonal matchings. 

1. Orthogonal Successive Optimal Matchings 

We use the term orthogonal successive optimal matchings to refer to matchings 
constructed by the following process: compute an optimal non-bipartite matching on the 
original data, then the next best matching that is orthogonal to the first, then the next best 
matching that is orthogonal to the first and second, and so on. Given N = 2n 
observations (we assume N even as before to simplify exposition) and some associated 
cost function, orthogonal successive optimal matchings have the following properties. 

Property 1 : At least N /2 matchings may be obtained by orthogonal successive 
optimal matching. 

Proof of Property 1 : The following lemma follows directly from Theorem 6.6 of 
Chartrand and Zhang (2005): “If a graph G has N = 2n vertices and each vertex has 
degree at least n, then G has a perfect matching.” Let Gq = (V,£’o) be the original graph 

on N vertices and A(A — l)/2 edges and let G,. ={y,E.) denote the subgraph whose 
edge set E. C consists of those edges that have not been utilized in matchings 1,..., /. 
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At the beginning of the matehing proeess, eaeh vertex in Gg has degree N — 1. After the 
first matehing N /2 edges are removed from Gg with eaeh vertex ineident to exaetly 
one sueh edge, so eaeh vertex in Gj has degree N — 2 and a perfeet matehing exists on 
Gj by the lemma. After N / 2 —I orthogonal sueeessive optimal matehings have been 
eomputed, G^/j-i degree (A —l) —(A/2 —1) = A/2 = n, so at least one more 
matehing exists by the lemma. ■ 

Property 2 : At most A — 1 matehings may be obtained by orthogonal sueeessive 
optimal matehing. 

Proof of Property 2 : From the diseussion in Property 1, eaeh vertex in graph G. 
has degree A — 1 — /. Therefore, G^_j has no edges, and no more sueeessive matchings 
exist. ■ 

The bounds associated with Properties 1 and 2 are both strong bounds in the sense 
that there exist cases where no more than A / 2 matchings may be obtained by 
orthogonal successive optimal matching, and also cases where exactly A — 1 orthogonal 
matchings may be obtained. The following examples demonstrate each case. 

Example for which no more than A / 2 matchings may be obtained : Let A = 6, 
and consider a regular hexahedron under a Euclidean cost function in three dimensions 
(that is, a polyhedron with 6 triangular faces and all edges of equal length). Label its five 
vertices as shown in Eigure 11. 


1 



Figure 11. Example for which no more than A/2 matchings may he obtained. 
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Without loss of generality, assume each edge length of this shape equals 1. Now insert a 
sixth point in the center of this shape (that is, at the midpoint of the segment connecting 
the vertices 1 and 5) and edges to connect it to all other five vertices. The resulting cost 
matrix is given in Table 2: 


c,j 1 2 3 4 5 6 

1 

2 

3 

4 

5 

6 

Table 2, Cost matrix for the regular hexahedron in Figure 11, 

It is quickly verified that an optimal matching in this case pairs vertices 1,5, and 
6 with any one of vertices 2, 3, and 4 so as to make a matching. Regardless of 
tiebreaking procedure, the first three successive matchings exhaust all pairings of 1, 5, 
and 6 with 2, 3, and 4. To obtain a fourth matching, vertices 2, 3, and 4 may only be 
paired among each other, in which case no partner exists for the third. Therefore, no 
more than the N / 2 orthogonal successive optimal matchings guaranteed by Property 1 
can be constructed. 

Example for which exactly N - \ matchings may be obtained : Let N = 4 , and 
consider a square under a Euclidean cost function with its vertices labeled as shown in 
Eigure 12: 


1 4 



Figure 12, Example for which exactly - 1 matchings may be obtained, 
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By inspection it is clear that, in order, |{l,2},{3,4}} , |{l,4},{2,3}} , |{l,3},{2,4}} is 
an example of — 1 = 3 orthogonal successive optimal matchings. 

In fact, the set of all possible collections of — 1 orthogonal successive optimal 
matchings is isomorphic to the set of all symmetric Latin squares of order N with the 
property that the integer N is on the diagonal. Recall that a Latin square of order N is 
an array consisting of the integers {l,..., A/^} such that each integer occurs exactly once in 
each row and once in each column. The isomorphism may be described as follows. 
Denote the N— I orthogonal successive optimal matchings by |Mj,..., M^_j|, where 

is thematching and u.j and y^j are the minimum and 

maximum sequence labels, respectively, of the pair listed in the matching. Enter the 
integer j in an NxN array at entries (w,y,y,^) and for all and all 

j — , and enter N on the array diagonal. The resulting Latin square carries 

the information indicating the stage in the succession of matchings at which the 
observation whose sequence number corresponds to row (column) a is paired with the 
observation whose sequence number corresponds to column (row) b, a . We 
illustrate the isomorphism with a very simple example on N = 6 observations in 
Figure 13. 

5 successive matchings 

M ,= {{1,2},{3,5},{4,6}} 

M,= {{1,3},{2,6},{4,5}} 

M 3= {{1,4},{2,3},{5,6}} 

M ,= {{1,5},{2,4},{3,6}} 

M ={{1,6},{2,5},{3,4}} 

Figure 13. Five orthogonal successive optimal matchings on N = 6 
observations with associated Latin square. Cost function does not 
necessarily satisfy the triangle inequality. 


6x6 Latin square 
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2 
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2 

1 

4 

5 

1 
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6 

3 

5 

2 

4 

1 

3 
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In our research, most random samples we observed of size N admitted A^ —1 
orthogonal successive optimal matchings. However, the fact that this is not always the 
case requires that our theory accommodate cases where less than A^ —1 orthogonal 
successive optimal matchings are admitted. 

2, The Statistic 


We proceed to formulate a test statistic based on orthogonal successive optimal 
matchings. Let . denote the sum of pair-maxima statistic associated with the best 

orthogonal matching. It is straightforward to show that ^ is marginally distributed as 

. We prove this for the case of continuous random variables. 


Proof : Let be some standard ordering of observations 


(Xj,...,X^) based solely on data content (e.g., an ordering based on observation norm) 


and let C be the cost matrix with respect to this standard ordering. Let ^ denote the set 
of all N xN 0-1 matrices associated with perfect matchings on N observations; that is, 
V is the set of all symmetric matrices that have a “1” entry for pairings in a matching 
and “0” otherwise. Two matchings U,V G "V are orthogonal if and only if 1/ oL = 0, 
where “ o ” is the Hadamard (coordinate-wise) product of U and V . Then the optimal 
non-bipartite matching problem can be expressed as 


(3.59) 


min tr (Cy) 
subject to y G "V, 

where tr(A) denotes the trace of matrix A. Now let yj* be the solution to (3.59), and 


define nested sets = orthogonal solutions V 2 ,V^ , and 


objective values z* for / G {l,..., N / 2} recursively as follows: 
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V* solves z* = min tr (CV) 

subject to y G "Vj, 

V 2 solves zl = min tr (CV) 

(3.60) subject to V G = jv G "V: V o Vj’ = o}, 

V* solves z* = min tr (CV), 

subject to V G^Vj = |y G^ViV oV* = 0 V 7 G {l,2,.. 

For continuous random variables, z^ < z* 2 < -< z* with probability 1. 

Now, for any permutation tt applied to the integers {l,...,A/^}, let denote the 
associated NxN permutation matrix. If the order of is permuted by tt , 

then the cost matrix for the permuted observations is A^CA^. Define nested sets 

orthogonal solutions Vj 2 , yj, 3 , • • •, Vj,, and obj ective values 
’ ^1,2 ’ • • • ’ ^ 1,1 for / G { 1 ,..., / 2 } recursively like before: 

yJi solves z* 1 = min tr(Qy) 

subject to y G 1 , 

(3.61) : 

Ki solves z*,. = min tr (Qy), 

subject to y G ,. = |y G "V :y ■ = 0 y/ G { 1 , 2 ,...,/— 1 }|. 

Note that tr(C^y) = tr(A^CA^y] = tr(CA^yA^] for all yG^ by the permutation 
invariance of the trace operator. 

We now show by induction that for any i, V* ■ = A V*A'. The assertion is true for 

J J ' -K^l TT I TT 

/ = 1, since AV* A'^ G "V = "V ., 

‘ TT 1 TT TT,! ‘ 

(3.62) z*. = min tr(C y) = min tr(CA'yA^) = min tr(Ct/) = z,*, 
and 

(3.63) tr(c,(AXy)) = tr(CV;) = z,-. 
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Now suppose Vlj = \V*Al for all 7 G {l,..., fc} , k<N ! 2. Then 


■V,,.*. = {v€V:Vov;j = 0 Vy€{ 1 , 2 ,....t}} 

= {veV:A:(VoV;j)A, = 0 Vje{l,2,.,.,t}} 

= {v e V: ( a;va, ) o (a;v;/, )=0 v y e {1, 2, .,., * 
= {V€V:A;V'A,oiy = 0 vy€{l,2,...,t}}. 


Therefore, the problem 
(3.65) 


mintr(Qy) 
subject to y e 


has the same solution as 


(3.66) 


min tr(C4;yAj 
subject to A'JA^ e , 


and the solution to (3.66) is V = A^y/_^jA'. Therefore, yj^+i = A^y/_^jA', and thus by 


induction V*,=Ay*A' for any i. 

t:,i 7T i tt J 


Finally, let p be the antirank vector for the ordered observations so that 
. Then for any i, if V* solves the orthogonal matching problem with 

respect to the standard ordering, then V*^ is the solution with respect to the original 

ordering. Conditional on , p is uniformly distributed over all Nl possible 

permutations applied to the integers {l,...,A/^} under the null hypothesis. So, each 

element of "V is equally likely to be the orthogonal successive optimal matching, and 
thus each possible labeling of vertices in that matching is equally likely. Therefore, ■ 

is marginally distributed as . ■ 


By Property 1 above, . is well-defined for all i <N / 2. We define 5^ ^ 
the cumulative sum of pair-maxima over the first k matchings: 

(3-67) V.=E7'».,- 


i=l 


to be 
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Likewise, 5^ is well-defined for all k <N /2. The mean of 5^ ^ under the null 
hypothesis is computed directly: 

r 1 ^ r 1 A^(A^ + 1 ) 

(3.68) = £[S„ ,] = \ ■ 

i=\ ^ 


Just as takes on smaller values under alternative hypotheses than under the 
null hypothesis, we expect that under alternative hypotheses 5^ ^ will tend to deviate 
below its mean value. Therefore, we define 


(3.69) 


= max 

ke{l,2,--;N/2} 




-s 


N,k 


to be our Ensemble Sum of Pair-Maxima (ESPM) test statistic, where 
(3.70) 

is a scaling constant whose choice is motivated in the following section. So, 
measures the maximum cumulative deviation of the . from their mean over N / 2 
orthogonal successive optimal matchings. For convenience we choose to be the 
negative of the typical centered random variable so that smaller values of 5^ ^ (which are 
evidence of an underlying distribution change) correspond to larger values of . 


3. Brownian Bridge Motivation for Kj^ 

The formulation of the statistic is based in part on structural similarities of 
the sequence to a Brownian bridge. Recall that a stochastic process 

{M'W. t>0| is called a Gaussian process if jj has a multivariate 

normal distribution for all (tj ,..., t J and for all 7 G {l, 2,...}, and that a Gaussian process 
{fi(t),0<t<l} with £'[fi(t)] = 0, 0 <t<l, and Cov(fi(5),fi(t)) = ^(l —t), 
called a Brownian bridge (Ross, 2003, pp. 622-623). 


63 



We desire an expression for the varianee of 5^^ to eompare the sequenee 
to a Brownian bridge. Sueh an expression depends on the eovariance 
between the whieh is diffieult to determine analytically. However, simulation 
suggests that Cov(r^,.,r^ J = Cov(r^ 2) for all / ^ j, so for the remainder of this 
section we assume this to be true for the sake of comparison to a Brownian bridge only. 

Under the assumption that Cov(r^J = Cov(r^ 2 ) for all i ^ j, it 
follows that 


(3.71) 



MV.l = *‘^H2EEcov(r„,,r„j) 

i=2 j=l 

+k(k-l)Cov(r^_i,r^_2) 


where the underscore-tilde notation indicates that equality depends on our 

covariance assumption. It is straightforward to solve for Cov(r^j,r^ 2 ) under this 

assumption by observing that in any case for which N — \ orthogonal successive optimal 
matchings can be constructed, every possible pairing of observations has been 
considered. For this case then, 

(3.72) = (A'-1) 


which is constant for fixed N. Therefore Var [5^ ^_j] = 0. Applying this boundary-value 
condition to (3.71) gives 

(3.73) 0 = Var[s ^^^_,] = (A -1)+ (A -1)(A - 2)CovV A , 


so 

(3.74) 


Cov(r^_i,r^2)^ 




(A-2) 


A(A-2)(A + 1) 
2(A-2)90 


A(A + 1) 
180 


V A. 


Therefore, we obtain the desired result 
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^N,k ~ 


(3.75) 


kal+k[k- 1) Cov (r^ ^) 
A^(A^ + 1) 


= kal —k{k — \) 

^ ^180 

_ kN{N + \){N-k-\) 

~ 180 


Finally, define 5^ g = 0 and ^ = 0, let t = ^ / (— l) for e {l,..., / 2}, and 

define stoehastic proeess (t) by 

1 2 N/2 


(3.76) 


, m. 


' N-l’ N-l’"'’{N-\) 

Then for all 5 and t sueh that s<t and G {0,1/(A/^ —l),..., A/^/(2A/^ —2)} and for all 
even N we have 

( 3 . 77 ) 5 ^( 0 ) = 0 , 


(3.78) 


£|5^U)I = 0, 


and 


65 



Cov[5^(5),5^(?)] = Cov 


^N,s(N-l) ^N,s{N- 1) ^N,t(N-l) ^N,t{N-l) 


/(iV-l) 

(l/c^jCov Tj^j 

j=s{N-\)+\ 

(1/4) Vark,,„_J + Cov 

1=1 J=i(A'-l)+l 

j{iV-l) i ( n -\) 

(1/4) Var[ 5 „,,2 E 

1=1 ;=.s(W-l)+l 

i{iV-l) f(lV-l) 

5'w,s(a^-i) "I" ^ ^ Cov[r^j,r^2| 

i=l ;=.s(W-l)+l 


= 5(1-5) + (1/c^)5(A^-1)(?-5)(A^-1) 


A^(A^ + 1)' 


(3.79) 


= 5(l-5)-5(?-5) 
= ^( 1 - 0 - 


This structure of (t), t G {0,1/(A^ —l),...,7//(2A^ —2)}| for our choice of and 

equal covariance assumption suggests a connection to the Brownian bridge. 

Shorack and Wellner (1986) present several useful results pertaining to the 
Brownian bridge, including the following; 


(t), 0 < t < 1 } is a Brownian bridge, then for all a,h > 0 

< a(l —+ for 0 < 5 < t < M < 1 15(5) = .r and 5 (u) =31 

(3.80) 

= 1 —exp|—2[a(l — 5 ) d-hj' — .r][a(l —m) + hM — j] /(w — 5)|. 


Setting a = h = A>0 and 7 = x = 0 


(3.81) 


P[B{t 


|<AforO<t<M<l \B{u)=y j 

1 —expj—2A(A — 7 )/m} , where B(u) ~ A^(0,m( 1 —w)]. 
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Taking the expected value over B{u) yields the identity 
for 0 < t < M < 1, A > O) 


(3.82) 


— / 
J 


yJlTTU (l —m) 


yju(l—u 


1 — exp' 


2A(A-y)' 




I 


1 —exp 


2A (a — x^u(l — u)' 


= q> 


= T> 


A 

1 

yju (l — m) 

■42% 

A 

-exp{ 

^Ju (l —m) 




exp 


expj 

2 

2u (l —m)J 

-m)) 

- 

expj 

^ 0 

2X^u (l —m) 


dy 


dx 


expj—2A^} Jx 


A(2m-1) 


^u(\ — u) 


where is the standard normal cumulative distribution function. For w = 1/2, equation 

(3.82) reduces to 

(3.83) P(5(t)<A for 0<t<l/2,A>0) = $(2A)-|exp{-2A"}. 

Now define K = sup^^jQ B (t) for Brownian bridge {5 (t), 0 < t < l}. It follows 
directly from (3.83) that the critical value corresponding to P[K > — a is a 

solution to 

(3.84) l-a-T>(2x) + ^exp{-2x'} = 0, 

which can be well-approximated using standard computing software. In other words, the 
null distribution of K is known and so K is an obvious choice for a test statistic if a 
process of interest is a Brownian bridge. The question at this point then is, “Does the 

process (t), t G {0,1 / (— l),..., / [2N — 2)}| asymptotically approach a Brownian 

bridge?” Under our covariance assumption, this process has the same mean and 
covariance structure as a Brownian bridge for all N. Furthermore 5^ (t) is by 
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construction a shifted, scaled sum of random variables whose marginal distributions are 
asymptotieally normal. Nevertheless, we are somewhat surprised to observe that {t) 
is not normal even for fairly large N. Figures 14-18 show normal quantile-quantile plots 
of for 10,000 simulations associated with the null hypothesis, using a standard 

bivariate normal distribution, A^ = 300, and t = kl{^N — \) for 
/c = 1, 10, 30, 100, and 150. Figure 14 shows k = \, whieh of course is simply 
(r^—We have proven already that Tj^ is asymptotically normal, and indeed 
Figure 14 is confirming evidence. 


QQ Plot of Sample Data versus Standard Normal 



Figure 14, Quantile-Quantile plot of 5^ [k! (A/^ —l)j for 10,000 simulations of 

N observations from a standard bivariate normal distribution; 

A^ = 300, k = \. 

As k varies from 1 to N / 2 , 5^ (t) exhibits markedly non-normal behavior as 
demonstrated in panels Figures 15-18. In Figure 15, for k = \0 (corresponding to 
t Tn 0.033) 5^ (r) shows signs of slight positive skewness. This skewness is much more 
apparent as k approaehes the middle of the interval as seen in Figures 16-18 for k = 30 
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(?!^0.10), /r = 100 (?!^0.33), and ^ = 150 (?f«0.50). In other words, B^{t) 

constitutes an unusual “natural” example of a case where the distribution of the sum of a 
large number of asymptotically normal random variables does not appear to approach a 
normal distribution. 


QQ Plot of Sample Data versus Standard Normal 



Figure 15, Quantile-Quantile plot of 5^ [k! (A/^ —l)) for 10,000 simulations of 

N observations from a standard bivariate normal distribution; 

Ar = 300, k = \Q. 
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QQ Plot of Sample Data versus Standard Normal 



Figure 16, Quantile-Quantile plot of 5^ [k! (A/^ —l)) for 10,000 simulations of 

N observations from a standard bivariate normal distribution; 

A^ = 300, k = l>Q. 


QQ Plot of Sample Data versus Standard Normal 



Figure 17, Quantile-Quantile plot of 5^ [k! (A/^ —l)) for 10,000 simulations of 

N observations from a standard bivariate normal distribution; 

A^ = 300, k = m. 
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QQ Plot of Sample Data versus Standard Normal 



Figure 18, Quantile-Quantile plot of 5^ [k! (A/^ —l)) for 10,000 simulations of 

N observations from a standard bivariate normal distribution; 

A^ = 300, k = \50. 

Of course, this evidence alone does not prove that 5^ (t) fails to approach a 

Brownian bridge in the limit; however, it does establish that even for fairly large N a 
Brownian bridge approximation may not be a good one. Indeed, simulation shows that 
for fairly large N (from 100 to 300) a Brownian bridge approximation for (t) gives 

reasonable tail probability estimates for a near .10, but less so for smaller values of a. 
Table 3 shows tail probability results based on 10,000 simulations of N observations 
from a standard bivariate normal distribution for cr = 0.10, 0.05, 0.025, and 0.01 where 
the achieved test level for each combination of N and a is the fraction of simulations for 

which = where is a root of (3.84). 
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a 


Achieved test level 

0 

0 

A = 200 

A = 300 

0.10 

0.9757 

0.0877 

0.0980 

0.0984 

0.05 

1.1334 

0.0586 

0.0649 

0.0635 

0.025 

1.2731 

0.0385 

0.0432 

0.0419 

0.01 

1.4382 

0.0248 

0.0278 

0.0266 


Table 3. Achieved test levels for j ^ 2 } {kl{N — l)) using Brownian 

bridge critical values for 10,000 simulations of N observations from a 
standard bivariate normal distribution. Achieved test level for each 
combination of N and a is the fraction of simulations for which 

max,,{02,...,^/2} (kl{N- 1)) > . 

Because the null distribution for Kj^ is difficult to obtain exactly, we obtain 

useful tail probability approximations by simulation. These tail probability 
approximations depend on both sample size N and dimensionality d. Approximate 
critical values for based on simulation are provided in Appendix B for various values 

of N , d , and a. As indicated in the SPM test discussion, the ESPM test proves to be 
significantly more powerful than a single SPM test. We show performance results in the 
next chapter. 

We close this discussion of an SPM-based ensemble statistic with the comment 
that it is less clear exactly how to formulate an analogous ensemble extension for the 
NAP statistic. In that case, it seems natural to find the sequence of orthogonal best 
matchings as in the ESPM case and then compute for each matching. Eetting . 

denote the statistic associated with the best orthogonal matching, one would 
expect to be able to extract more change-point information out of the collection of vectors 
|m^ j , 2 ’ • • • ’ } than out of the NAP test using j alone. Eor now, we leave 

study of an ensemble NAP statistic for future work. 
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D. 


THE BIPARTITE ACCUMULATED PAIRS TEST 


The tests introdueed thus far are based on non-bipartite weighted matehings, and 
they test for a ehange point over an entire set of observations for whieh there is no eontrol 
set. That is, no prior information exists regarding the in-eontrol distribution. In many 
cases, however, some history of observations which are known (or assumed) to be in¬ 
control is available, and the problem is to determine whether a change point exists in a set 
of future observations. Therefore, we propose the Bipartite Accumulated Pairs (BAP) 
test for cases where in-control observation history is available. As the name indicates, 
the BAP test is constructed using bipartite matchings (as opposed to non-bipartite 
matchings as in all our previous tests). Recall that a bipartite matching pairs observations 
from one pre-designated group with observations from another, and is a solution to the 
integer program (2.13). 

1. The Z Statistic 

Assume we have some history {Aj,...,A^} of m control observations, and we 

desire to test whether a change point exists in a sequence (,..., ) of n new test 

observations. One approach to this problem is to estimate the in-control distribution 
based on the observation history and then test whether it is likely that the new 
observations are drawn from the estimated distribution. An alternative matching-based 
approach is to compute an optimal bipartite matching between the control and test 
observations and use the information in the matching to test whether a change point exists 
in the test data. 

We employ the following approach for the case m<n (more test data than 
control data; we discuss the m>n case later): Compute a minimum-weight bipartite 
matching which pairs each control observation with some test observation, based on 
some predetermined cost function. We emphasize here that some test observations will 
necessarily remain unpaired, unlike the non-bipartite matching case where at most one 
observation is left unpaired. Define random variables Zj,...,Z^ j where Z^ is the 
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number of test observations among whieh are paired with eontrol 

observations. Notice that this test has very much the same flavor as the NAP test, as it 
counts the accumulation of pairs in the optimal matching with respect to the order of the 
test data. By construction the are dependent random variables each of which is 
marginally distributed as hypergeometric. In fact, =Z^_j+(5'^, where <5'^ =1 if test 
observation Z^^^ is paired with a control observation and S^=0 otherwise. So, 



This test is designed to test null hypothesis H^:F = G against alternative 
H,:F^G where X„...,X^-F, X ^^,,..., ~ G , and F and G are unknown. If a 

change point is present in the sequence of test observations, one expects that more pairs 
will be formed between the first k test observations and the control observations than if 
no change point is present, so the existence of a change point should be seen in pairings 

- 

being “front-loaded” in the Z^ sequence. We call the vector Z = (Zj,...,Z„_jj the 
Bipartite Accumulated Pairs (BAP) test statistic. 


2. Critical Envelope 

We develop an exact simultaneous test for Z in a manner closely following that 
for of the Non-Bipartite Accumulated Pairs (NAP) test. Specifically, we seek a 

vector of non-negative integers so that the following is true for a given 

test level a : 

(3.86) P(Z^ < <7^, 1 < k < n-l) > 1-a . 

This allows us to construct a simultaneous level a test. As for the NAP case, we take q^. 
to be the I-a quantile of the distribution of Z^ (hypergeometric) so that the non- 
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simultaneous test at stage k has level a; that is, P[Z^>qi^)<d for each 
k . Then we use a recursive computational scheme to select d that gives a 

simultaneous test level as close to a as possible without exceeding this value. To 
develop the recursion, first note that the d -quantiles satisfy the properties 
q^. = min{r: <r)> d} and q^<q 2 <---< q„_i. Now using the same conditioning 

approach as in (3.52)-(3.58), we have 

* 

(3.87) P(Zj <q^,l<j<k) = ^7r{r,k)-P{Z, = r) 

r=0 

where 7r[r,k) = P[Z^<q^,...,Z^_^<q^_^\Z^=r) and P[Z^=r) is given by (3.85). 
Observe that ;r(r. A:) observes a simple recursive relationship expressed by the following 
lemma: 

Lemma 3-6 : 

^{r,k) = j;T{r-\,k-l)l{r-\<q^_^)+^^;v{r,k-\)l{r<q^_^), 

(3.88) k k 

k = 2 ,..., max: q. < min(/,m)|; r = 0 ,..., ^^ . 

Proof of Lemma 3-6 : 

= = 5,Z, =r)-P(Z,_i <5|Z, =r) 

= Y^P(Zj<qjA<j<k-2\z^_^ = s)-P{Z^_^<s\Z^=r) 

(3.89) =^7r(5,/c-l)-P(Z,_i <5|Z, = r)-I [s <q,_^) 

= 7r(r-l,/c-l)-P(Z,_j = r-l|Z, =r)-/(r-l<^,_i) 

+7v{r,k -!)■ P[Z,_^ = r\Z, = r)-1{r < q,_,) 

Under the null hypothesis, Zi^= r implies that each of the k observations ,..., 
is equally likely to be among r pairs in the bipartite matching; therefore 

P(^k-i=r-'^\Z,=r) = j and p(z,_i = r|Z, = r) = ■ 
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The recursion starts at k = 2 and continues to A: = max|/: <5^; < min(/,m)|. For 
k = 2 wQ have 

0 <r<^ 2; 

0, r>q^. 

This recursion scheme is quite readily implemented in S-PLUS®, R, MATLAB®, or 
other interpreted languages; an implementation for R is included in Appendix C. 

The simultaneous test presented here is clearly an improvement over the 
Bonferroni method. For example, with m = 25 control points and n = 50 test points a 
nominal 0 = 0.05 simultaneous test achieves test level .047 using uses a = .011. By 
contrast the Bonferroni method achieves test level .006 using a = .05/49 = .001. The 
improvement is even more dramatic in larger samples. For m = 500, n = 1000, the BAP 
test achieves level .049 using d = .00203 compared to an achieved level of .0018 using 
« = .05 / 999 = .00005 by Bonferroni. 

3, A Graphical Example 

Figure 19 shows 20 observations all drawn from a standard bivariate normal 
distribution. The plot marker for each point is its sequence label. The first m = 4 points 
are the control set and are circled for emphasis; the last n = 16 points are the test set. 
Since all the test data are from the same distribution as the control data, there is no 
change point. We use a MATLAB® implementation of the Jonker-Volgenant 
assignment algorithm (1987) provided by Levedahl (2000) to compute an optimal 
bipartite match with respect to Euclidean distance for these data; the resulting pairs are 

shown connected by line segments. Table 4 shows the critical envelope 

and the test statistic Z = (Zj,...,Zj5j for the data in Figure 19. Recall that k = \ 

corresponds to , k = 2 corresponds to A^^j = so on, and Z^ is equal 

to the number of pairings of between the sets {Aj,...,A 4} and {A^,..., A5^^_j}. For this 

small data set we can determine the Z^ values by inspection from Figure 19. Since 
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(3.90) 


n 


(r,2) = 



Xj, Xjg, andare the test set elements paired with elements in the control set, 

Zj=Z 2 =l, Z^=Z^=Z^=2 , Zg = ••• = Zjo = 3 , and Zjj = •■• = Zjj = 4 . None of 

these values exceeds the critical envelope, so we do not reject the null hypothesis that all 
the test data share the same distribution as the control data. 



Figure 19, Minimum weight bipartite matching on 20 points; m = 4 and 
n = 16 with no change point. The control set is circled; line segments 
connect observations that are paired in the optimal bipartite 
matching. 


k 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

(Ik 

1 

2 

2 

3 

3 

3 

3 

4 

4 

4 

4 

4 

4 

4 

4 

z, 

1 

1 

2 

2 

2 

3 

3 

3 

3 

3 

4 

4 

4 

4 

4 


Table 4, Critical envelope and BAP test statistic Z for Figure 19 data, Z^ 
never exceeds , so the null hypothesis of no change is not rejected. 
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In contrast, Figure 20 shows the same control data, but the test data are different 
in that a change point exists at r = 13, where the index r is in referenee to the pooled 
dataset {Xj,...,X2o}. Speeifieally, observations {X5,...,Xj2} are the same as in the no¬ 
change case, but {Xj3,...,X2o} are translated by 2 units in both dimensions from the no¬ 
change ease. The eontrol data are cireled as before, and the data affeeted by the change 
point are cireumscribed by a box to elearly show the mean shift. 



x1 


Figure 20, Minimum weight bipartite matching on 20 points; m = 4 and 
n = 16 with a change point r = 13. The control set is circled; post¬ 
change point observations are boxed. Line segments connect 
observations that are paired in the optimal bipartite matching. 
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k 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

klk 

1 

2 

2 

3 

3 

3 

3 

4 

4 

4 

4 

4 

4 

4 

4 


1 

1 

2 

3 

3 

4 

4 

4 

4 

4 

4 

4 

4 

4 

4 


Table 5, Critical envelope and BAP test statistic Z for Figure 20 data. 
Z^> q^, so the null hypothesis of no change is rejected. 

Table 5 shows and test statistie Z for Figure 20. Sinee Z^> q^, we rejeet the null 
hypothesis in favor of the alternative that a change point exists in the test data. 

An alternative to the BAP test, as we have formulated it here, is required for the 
case m>n, where there exist at least as many control points as test points, since the BAP 
test would pair every test point to a control point in such cases. A matching-based 
possibility appeals to previous results: First, assign to each point in the control set some 
measure of centralness or depth relative to that set; call this measure the observation 
“quality.” Then compute an optimal bipartite matching and let Q j denote the quality of 

the control observation that is paired in the matching with test observation . Since 
m>n, every observation in the test set is paired with some observation in the control set, 
resulting in the sequence ■ Now assign ranks to this sequence and perform a 

change-point test on it based on ranks. The existence of a change-point in the rank 
sequence corresponds to the existence of a change-point in the test set. 

The BAP test invites consideration of an ensemble extension similar to the 
ensemble extension of the NAP test. That is, compute the BAP statistic for each optimal 
matching in an orthogonal sequence of optimal matchings, and evaluate the collection 

|Zj,Z2,...,Z^_j| for additional change-point information. We do not explore this 

concept here; rather, the proposed BAP test and associated discussion only begin to 
examine what appear to be rich opportunities to apply bipartite matching ideas to the 
change-point problem, and we mention them here to inspire further research. In the next 
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chapter, we will examine the performanee of our proposed non-bipartite matching 
methods only, and leave an in-depth study of bipartite methods for future work. 
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IV. SIMULATION STUDY 


A. METHODOLOGY 

The usefulness of the tests presented in this paper lies in their power to identify 
change points under a wide range of alternative hypotheses. In this chapter, we present 
the results of computer simulations that compare the power of the Sum of Pair Maxima 
(SPM), Non-Bipartite Accumulated Pairs (NAP), and Ensemble Sum of Pair Maxima 
(ESPM) tests for a variety of scenarios. We compare these tests to the James, James, and 
Siegmund (JJS) (1992) test discussed in Section II.B.2. In every case the sample space is 
(where d is observation dimension), sample size is A = 200, and test significance 
level is a = 0.05 . The choice N = 200 is based on a desire to investigate test behavior 
for a moderately large sample size while avoiding excessive computation times for large 
simulations. Detection power is the performance metric, where power is defined as the 
probability of rejecting the null hypothesis when it is false. Each power estimate in the 
tables of this chapter is the fraction of times that a particular test indicates that a change 
point has occurred under the given conditions, based on 1000 simulations. We use the 
Mahalanobis distance function given by (2.16) (estimating E by the sample covariance 
matrix) as a natural choice to determine interpoint costs in unless otherwise specified. 
Section C.4 of this chapter shows performance results for cases where different cost 
functions are considered. In all other cases, the scenarios vary according to the following 
factors: 

1) Underlying distribution family. We consider the following probability 
distribution families: 

a) multivariate normal, denoted , 

b) a multivariate normal mixture, denoted , as a heavy-tailed case, 
and 

c) a multivariate Weibull distribution, denoted , as a skewed case. 
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2) Dimension. We evaluate two dimensions: d = 5 and d = 20 . 

3) Change-point location. We examine eases where the ehange point occurs in 
the middle and toward the end of the observation sequence. 

4) Change parameter. We consider changes in distribution mean and scale. 

5) Type of change. At a change point, we consider cases where the parameter 
undergoing change does so in an abrupt Gump) or gradual (drift) manner. 

6) Magnitude of change. We examine changes of various magnitudes. 


(4.1) 


Specifically, 

/mVN 


is the cumulative distribution associated with density function 



where p 6 and E G are the distribution mean and covariance matrix, 
respectively. The in-control data for the multivariate normal case have p = 0 and 
S = 7^, where 1^ is the dxd identity matrix. The post-change-point data have a 
different mean or scale as specified in the next section. 


is constructed as follows: Let U ~ , let Z be a Bernoulli random 

variable with success probability p ^.^, and let be some scalar larger than 1. Then 
the random variable 

(4-2) X=[l + (ff.,„-l)Z](;~F„., 


and p^^^ and are the proportion and scale of the mixing, respectively. For this 
study, we set p^^^ = 0.1 and <7^;^ = 4 . 

F^gibis defined to be the distribution on associated with d independent 
identically distributed univariate Weibull random variables; that is, 

x = (xi,...,x,)g]R+, 

0 otherwise, 


(4.3) 


fw,ib(x;>?.P)= 


n 


1 —exp 


A 

A 


\v 


82 



where rj and (3^ are the univariate Weibull shape and seale parameters, respeetively, and 
denotes the elosed upper half-spaee of M"'. For this study we set 77 = 1.5. For the 
in-control data, P = 1 ; P varies for the post-change-point data as specified in the next 
section. 

We consider a change point at r = 101 (for a 50-50 percent split of pre- and post¬ 
change data) and r = 161 (for an 80-20 percent split of pre- and post-change data). We 
examine change magnitudes, denoted by A, ranging from 0 to 1.0 by increments of 0.25. 
For the case A = 0 the null hypothesis is true and the tabulated power at A = 0 is an 
estimate of the test’s false alarm rate, which is the likelihood that a test rejects the null 
hypothesis when it is in fact true. False alarm rate ideally is 0.05 at significance level 
a = 0.05 . For the case A > 0, A indicates the total magnitude of the change from the 
first to last observation. So, for a jump change A is magnitude of the abrupt change that 
occurs at change point r. For a drift change, the parameter undergoing change varies 
linearly from its reference level so that the total change magnitude between the first and 
last observation is A. For changes in distribution mean, we simulate the change in the 
first component of each observation only. For changes in distribution scale, we simulate 
the change in all components. Figures 21-28 shows cases of mean and scale changes of 
jump and drift variety for change points at t = 101 or t = 161 associated with the 
multivariate normal case for illustration purposes. The x-axis is sequence label i; the y- 
axis is the value of the first component of the observation. 
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Example of jump change in mean: A =1; r=101 


X' 


Observation 
Mean vaiue 


20 40 


BO 80 100 120 140 

Observation sequence label, / 


160 1B0 200 


Figure 21, Typical change scenario: mean jump at r = 101. The mean jumps 
in the first dimension only and remains fixed in all other dimensions. 


Example of drift change in mean: A =1; -=101 




Observation 
Mean value 


20 40 60 80 100 120 140 160 160 


Observation sequence label, / 


Figure 22, Typical change scenario: mean drift at r = 101, The mean drifts 
in the first dimension only and remains fixed in all other dimensions. 
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Figure 23, 


X' 


Figure 24, 


Example of jump change in mean: A =1: r =161 


Observation 
Mean value 


0 .- .-. ; ... 


20 40 


60 80 100 120 
Observation sequence label, / 


140 160 180 200 


Typical change scenario: mean jump at r = 161, The mean jumps 
in the first dimension only and remains fixed in all other dimensions. 


Example of drift change in mean: A =1; -=161 


Observation 
Mean value 


20 40 60 80 100 120 140 160 160 200 


Observation sequence label, i 


Typical change scenario: mean drift at r = 161, The mean drifts 
in the first dimension only and remains fixed in all other dimensions. 


85 





















Example of jump change in scale: a =1: r=101 


• Observation 


- +/-1 sd 


" 

* 


♦ *' -— '♦ ♦♦♦ * 

_* *♦*** *♦ * 

* .. , * * * ** • 

- 

, ' • ’ . • * ’ 



_1_1_ 

_1_1_1_1_1_1_1_ 


X- 


20 40 


60 80 100 120 140 

Observation sequence label, / 


160 180 200 


Figure 25, Typical change scenario: scale jump at r = 101, The scale jumps 
in all dimensions. 


Example of drift change in scale: a =1; -=101 

• Observation 

- +/-1 sd 



0 20 40 60 80 100 120 140 160 160 200 

Observation sequence label, / 

Figure 26, Typical change scenario: scale drift at r = 101. The scale jumps in 
all dimensions. 
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Example of jump change in scale: a =1; r=161 


1-1“ 


----— t - ^ -- ♦ - i 


X' 


Observation 
-+/-1 sd 


20 40 


BO 80 100 120 140 

Observation sequence label, / 


_l_1_ 

160 180 200 


Figure 27, Typical change scenario: scale jump at r = 161, The scale jumps 
in all dimensions. 


Example of drift change in scale: a =1; r=161 


“!-1“ 




• Observation 
- +/-1 sd 



20 40 


60 80 100 120 140 160 160 200 

Observation sequence label, / 


Figure 28, Typical change scenario: scale drift at r = 161. The scale jumps in 
all dimensions. 
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B, 


PERFORMANCE RESULTS 


Tables 6-13 show power estimates for the Sum of Pair-Maxima (SPM), Non- 
Bipartite Aceumulated Pairs (NAP), Ensemble Sum of Pair-Maxima (ESPM) and James, 
James, and Siegmund (JJS) tests for a variety of scenarios at test level a = 0.05. 
Attained significance levels for the SPM and NAP tests with N = 200 are 
ftjpM = 0.04962 and aj^^p = 0.04957 . ESPM critical values are obtained via simulation 

(see Appendix B). JJS critical values are analytically based on the assumption that the 
underlying distribution is multivariate normal with a common but unknown covariance 
matrix. While the JJS test is analytically designed to detect abrupt mean changes only, 
we observe in our simulations that it is also sensitive to gradual mean changes and scale 
changes while maintaining a false alarm rate consistent with test significance level. 
Therefore, we consider the JJS test for comparison in those cases as well. 


Each table specifies the distribution, dimensionality, change parameter (mean or 
scale), and change type Oump or drift) along the top. The left-most column shows the 
total magnitude of the change for the varying parameter in that case. The change 
magnitude at i is = A for jump changes and A^ = (/ —T-t-l)A/(A —t-|- 1) for drift 


changes. Eor a mean change at change point r, observation i is distributed as follows: 

, ' I for the MVN case; 

A,.-(A.,0,0 ,...,0 )~E^v^ , />tJ 

(4.4) 

A.. ~ F^:. . i<T 1 

for the mixture case. 


X,-{A.,0,0,...,0)-F^.^, i> 


r\ 


We model changes in the mean vector by changing only its first component because of 
the rotational invariance of F^^y^ and F^.^. Eor a scale change at change point r, 
observation i is distributed as follows: 
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i < T 


(4.5) 


Y P 

-^MVN ’ 


X, 


1 + A,. 


F i ^ T 

MVN ’ ' 


for the MVN case; 


—— - F , i> T 

1 + A. “ 


for the mixture case. 


For fixed rj , a change in p results in a mean and scale change for the Weibull 

distribution. We change only the first component of P to be consistent with the other 

two distribution family cases, so observation i is distributed as follows: 

~ -^Weib with p = 1, / < r 1 

(4.6) , , [for the MV Weibull case, 

with P = (l + A„l,...,l), i>Tl 

1. Multivariate Normal Case 


a. Changes in Location 

Tables 6-10 present power estimates for the multivariate normal scenarios 
under different alternatives. Tables 6-8 are associated with a mean change. One would 
expect the JJS test to be superior to nonparametric tests for the mean jump cases, as JJS is 
a parametric test based on the assumptions of multivariate normality and a single abrupt 
change in distribution mean. Our simulation results suggest that the JJS test is superior 
overall in both the jump and drift cases. However, both the SPM and NAP tests show 
appreciable power in each case, and even more noteworthy that the power of the ESPM 
test is comparable to JJS in some cases. In particular, when the change point occurs in 
the middle of the observation sequence (Tables 6 and 7), the JJS test and ESPM test 
perform comparably. When the change point is away from the middle of the observation 
sequence (Table 8) all the tests suffer somewhat, since fewer post-change data are present 
to indicate a change. However, the nonparametric tests seem to suffer more power loss 
than the JJS test. Eurthermore, the power of all four tests is reduced in higher dimension 
for changes of fixed magnitude (compare Table 6 with J = 5 to Table 7 with d = 20), 

89 



since the magnitude of the ehange beeomes a smaller fraetion of the average distanee 
between points as dimensionality inereases. This effeet appears to impaet the ESPM and 
JJS tests eomparably. 


F • d = 5- 
mean change 

Jump; T = 101 

A 

SPM 

NAP 

ESPM 

JJS 

0.00 

0.06 

0.05 

0.04 

0.05 

0.25 

0.07 

0.05 

0.18 

0.14 

0.50 

0.10 

0.09 

0.60 

0.52 

0.75 

0.23 

0.18 

0.93 

0.93 

1.00 

0.41 

0.33 

1.00 

1.00 


Drift; r = 101 

SPM 

NAP 

ESPM 

JJS 

0.04 

0.05 

0.06 

0.07 

0.06 

0.05 

0.10 

0.09 

0.07 

0.05 

0.27 

0.22 

0.11 

0.09 

0.55 

0.53 

0.20 

0.16 

0.84 

0.85 


SPM; Sum of Pair-Maxima NAP: Non-Bipartite Aeeumulated Pairs 

ESPM: Ensemble Sum of Pair-Maxima JJS: James, James, and Siegmund test 

Table 6 , Test power to detect a mean change of magnitude A for MVN case 
with dimension d = 5 and change point r = 101 based on A = 200, 
a = 0.05, and 1000 simulations. Jump case is to the left; drift case is to the 

right. 


F^^;d = 20; 
mean change 

Jump; T = 101 

A 

SPM 

NAP 

ESPM 

JJS 

0.00 

0.05 

0.05 

0.05 

0.03 

0.25 

0.07 

0.06 

0.11 

0.07 

0.50 

0.09 

0.07 

0.33 

0.20 

0.75 

0.11 

0.09 

0.71 

0.63 

1.00 

0.22 

0.16 

0.95 

0.95 


Drift; r = 101 

SPM 

NAP 

ESPM 

JJS 

0.05 

0.05 

0.05 

0.04 

0.05 

0.05 

0.07 

0.04 

0.07 

0.05 

0.13 

0.09 

0.08 

0.07 

0.31 

0.23 

0.11 

0.09 

0.56 

0.49 


SPM: Sum of Pair-Maxima NAP: Non-Bipartite Aeeumulated Pairs 

ESPM: Ensemble Sum of Pair-Maxima JJS: James, James, and Siegmund test 

Table 7. Test power to detect a mean change of magnitude A for MVN case 
with dimension J = 20 and change point r = 101 based on A = 200, 
a = 0.05, and 1000 simulations. Jump case is to the left; drift case is to the 

right. 


90 







F ■d = 5- 
mean change 

Jump; T = 161 

A 

SPM 

NAP 

ESPM 

JJS 

0.00 

0.05 

0.06 

0.06 

0.06 

0.25 

0.06 

0.05 

0.08 

0.08 

0.50 

0.07 

0.08 

0.22 

0.30 

0.75 

0.12 

0.10 

0.52 

0.72 

1.00 

0.19 

0.20 

0.81 

0.96 


Drift; r = 161 

SPM 

NAP 

ESPM 

JJS 

0.05 

0.06 

0.05 

0.05 

0.05 

0.05 

0.06 

0.07 

0.06 

0.06 

0.10 

0.12 

0.06 

0.08 

0.15 

0.27 

0.05 

0.08 

0.28 

0.51 


SPM; Sum of Pair-Maxima NAP: Non-Bipartite Accumulated Pairs 

ESPM: Ensemble Sum of Pair-Maxima JJS: James, James, and Siegmund test 

Table 8, Test power to detect a mean change of magnitude A for MVN case 
with dimension d = 5 and change point r = 161 based on A = 200, 
a = 0.05, and 1000 simulations. Jump case is to the left; drift case is to the 

right. 


b. Changes in Scale 

Tables 9 and 10 show power estimates for multivariate normal scale 
changes. Note that the case in Table 9 (d = 5, r = 101) varies from the case in Table 10 
(d = 20, T = 161)in both dimensionality and change point. We observe that the SPM 
and NAP tests demonstrate reasonable power to detect scale ehanges, while the ESPM 
test shows impressive power to do so. Reeall that the JJS test is not specifically designed 
to detect scale changes; however, it does. Interestingly, JJS exhibits its worst power 
among these scenarios in lower dimension with a 50-50 split and mean jump, and its best 
power in higher dimension with an 80-20 split and mean drift. 
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F ■d = 5- 
scale change 

Jump; T = 101 

A 

SPM 

NAP 

ESPM 

JJS 

0.00 

0.06 

0.05 

0.05 

0.04 

0.25 

0.16 

0.11 

0.32 

0.09 

0.50 

0.51 

0.42 

0.97 

0.15 

0.75 

0.88 

0.88 

1.00 

0.19 

1.00 

0.99 

0.99 

1.00 

0.24 


Drift; r = 101 

SPM 

NAP 

ESPM 

JJS 

0.05 

0.05 

0.05 

0.05 

0.08 

0.08 

0.15 

0.14 

0.27 

0.20 

0.52 

0.27 

0.52 

0.46 

0.92 

0.42 

0.79 

0.77 

1.00 

0.54 


SPM; Sum of Pair-Maxima NAP: Non-Bipartite Accumulated Pairs 

ESPM: Ensemble Sum of Pair-Maxima JJS: James, James, and Siegmund test 

Table 9, Test power to detect a scale change of magnitude A for MVN case 
with dimension d = 5 and change point r = 101 based on A = 200, 
a = 0.05, and 1000 simulations. Jump case is to the left; drift case is to the 

right. 


^mvn;^ = 20; 

scale change 

Jump; T = 161 

A 

SPM 

NAP 

ESPM 

JJS 

0.00 

0.05 

0.06 

0.05 

0.04 

0.25 

0.08 

0.08 

0.24 

0.26 

0.50 

0.25 

0.28 

0.83 

0.66 

0.75 

0.56 

0.75 

1.00 

0.90 

1.00 

0.85 

0.99 

1.00 

0.98 


Drift; r = 161 

SPM 

NAP 

ESPM 

JJS 

0.05 

0.05 

0.06 

0.04 

0.06 

0.07 

0.10 

0.23 

0.09 

0.11 

0.25 

0.70 

0.17 

0.25 

0.55 

0.93 

0.28 

0.49 

0.86 

0.99 


SPM: Sum of Pair-Maxima NAP: Non-Bipartite Aecumulated Pairs 

ESPM: Ensemble Sum of Pair-Maxima JJS: James, James, and Siegmund test 

Table 10. Test power to detect a scale change of magnitude A for MVN case 
with dimension d = 5 and change point r = 161 based on A = 200, 
a = 0.05, and 1000 simulations. Jump case is to the left; drift case is to the 

right. 
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2. 


Multivariate Normal Mixture Case 


a. Changes in Location 

Table 11 shows power results for the ease of an underlying multivariate 
normal mixture with =0.10 and cr^;^ =4 as an example of a heavy-tailed 
ease: 


SPM: Sum of Pair-Maxima NAP: Non-Bipartite Aoeumulated Pairs 

ESPM: Ensemble Sum of Pair-Maxima JJS: James, James, and Siegmund test 

Table 11. Test power to detect a mean change of magnitude A for MVN 
mixture case with dimension d = 5 and change point r = 101 based on 
A = 200, a = 0.05, and 1000 simulations. Jump case is to the left; drift case 
is to the right. Shading indicates excessive false alarm rate. 


Drift; r = 101 

SPM 

NAP 

ESPM 

JJS 

0.04 

0.04 

0.06 

0.28 

0.07 

0.06 

0.09 

0.26 

0.07 

0.07 

0.21 

0.33 

0.10 

0.09 

0.47 

0.39 

0.15 

0.12 

0.76 

0.55 


mean change 

Jump; T = 101 

A 

SPM 

NAP 

ESPM 

JJS 

0.00 

0.05 

0.05 

0.04 

0.27 

0.25 

0.07 

0.05 

0.14 

0.28 

0.50 

0.09 

0.08 

0.56 

0.38 

0.75 

0.20 

0.15 

0.88 

0.61 

1.00 

0.36 

0.25 

0.99 

0.85 


The matehing-based tests demonstrate results eomparable to their 
respective powers in the similar multivariate normal case (compare to Table 6) and they 
retain a false alarm rate consistent with test significance level. However, the false alarm 
rate for the JJS test far exceeds 5% and therefore disqualifies it for comparison at the 0.05 
test level. We explore this phenomenon in a separate study using 10,000 simulations 
(A = 200,50-50 split) and find that the JJS false alarm rates exceed test level even for 
small mixing probabilities. The problem gets worse as dimensionality increases, as 
shown in Eigure 29. 
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Effect of mixing proportion on J JS false alarm rate for nominal a = 0.05 and a- = 4 

I I nniY 



Figure 29, Effect of mixing proportion on JJS false alarm rate for MVN 
mixture cases with dimension d = 2,5, and 20 based on = 200, 

a = 0.05, and 10,000 simulations. Scale of mixing = 4 ; 
proportion of mixing varies along the jc-axis. 


b. Changes in Scale 

As in the multivariate normal case, Table 12 shows that the SPM and NAP 
tests have fair power to detect scale changes and the ESPM test has noteworthy power to 
do so. Again, excessive false alarm rate makes JJS an unacceptable test for these 
scenarios. These multivariate normal mixture cases of changing location and scale 
highlight the utility of nonparametric change-point approaches such as the SPM, NAP, 
and ESPM tests in that they are not limited by strict distributional assumptions. 
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scale change 

Jump; T = 101 

A 

SPM 

NAP 

ESPM 

JJS 

0.00 

0.05 

0.05 

0.05 

0.25 

0.25 

0.12 

0.09 

0.31 

0.29 

0.50 

0.38 

0.28 

0.92 

0.31 

0.75 

0.75 

0.69 

1.00 

0.32 

1.00 

0.93 

0.92 

1.00 

0.32 


Drift; r = 101 

SPM 

NAP 

ESPM 

JJS 

0.06 

0.06 

0.05 

0.29 

0.08 

0.08 

0.13 

0.32 

0.16 

0.13 

0.48 

0.39 

0.35 

0.27 

0.85 

0.45 

0.54 

0.45 

0.99 

0.50 


SPM; Sum of Pair-Maxima NAP: Non-Bipartite Accumulated Pairs 

ESPM: Ensemble Sum of Pair-Maxima JJS: James, James, and Siegmund test 

Table 12. Test power to detect a scale change of magnitude A for MVN mixture 
case with dimension d = 5 and change point r = 101 based on A = 200, 
a = 0.05, and 1000 simulations. Jump case is to the left; drift case is to the 
right. Shading indicates excessive false alarm rate. 


3. Multivariate Weibull Case 

Table 13 presents power results assoeiated with the multivariate Weibull 
distribution as an example of a skewed ease. Eor these simulations shape parameter 
?7 = 1.5 remains fixed while seale parameter P varies from 1 to 2 in the first dimension 
only; this eorresponds to eoineident ehanges in both loeation and seale. While false 
alarm rates for the JJS test are not as exeessive for this case as for the multivariate 
mixture ease, they still elearly violate the speeified 0.05 test level. As before, the 
matehing-based tests appear to respeet test level. 
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F ■ d = 5- 
P change 

A 

0.00 

0.25 

0.50 

0.75 

1.00 


Jump; T = 101 

SPM 

NAP 

ESPM 

JJS 

0.05 

0.05 

0.06 

0.09 

0.08 

0.06 

0.25 

0.22 

0.13 

0.10 

0.70 

0.70 

0.25 

0.20 

0.95 

0.96 

0.34 

0.27 

0.99 

1.00 


Drift; r = 101 

SPM 

NAP 

ESPM 

JJS 

0.06 

0.05 

0.05 

0.09 

0.06 

0.07 

0.12 

0.17 

0.08 

0.07 

0.35 

0.46 

0.15 

0.12 

0.70 

0.81 

0.23 

0.20 

0.86 

0.94 


SPM; Sum of Pair-Maxima NAP: Non-Bipartite Accumulated Pairs 

ESPM: Ensemble Sum of Pair-Maxima JJS: James, James, and Siegmund test 

Table 13. Test power to detect a change in the scale parameter (3 of magnitude 
A for MV Weibull case with dimension d = 5 and change point r = 101 
based on A = 200, a = 0.05, and 1000 simulations. Jump case is to the left; 
drift case is to the right. Shading indicates excessive false alarm rate. 


In summary, the Sum of Pair-Maxima (SPM), Non-Bipartite Accumulated Pairs 
(NAP), and Ensemble Sum of Pair-Maxima (ESPM) tests all demonstrate power to detect 
a change point in every examined case for different underlying different distributions, 
dimensionality, change-point loeation, ehange parameter, and type of ehange while 
achieving a signifieance level eonsistent with nominal test level. The power of each test 
is reduced as dimension inereases or as change-point loeation moves away from the 
middle of the observation sequenee. 

The ESPM test outperforms the SPM and NAP tests in every case and is 
preferable among the three tests for use. The ESPM test has power comparable to the 
parametric JJS test in the case of a mean change when the underlying distribution is 
multivariate normal, except perhaps when the change-point is far away from the center of 
the sequence. The ESPM test is preferable to the JJS test in non-normal cases due to 
excessive JJS false alarm rates, and is superior in deteeting scale changes when the 
underlying distribution is normal. The NAP test should be eonsidered for use if one 
desires information about the loeation of a ehange point in addition to detecting whether 
or not a change-point exists. 
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c. 


DIFFERENT COST FUNCTIONS 


To gain insight regarding the impaet of cost function selection we compare the 
performance of the ESPM test using Mahalanobis distance (MD), Euclidean distance 
(ED), Mahalanobis distance, robust (MD-R), and multivariate rank distance (RD) as 
defined in (2.15)-(2.23) for a few representative cases. We list MD in the first column as 
the reference cost measure which was used for all previous cases in the simulation study. 
Eor MD-R we set the nearest-neighbor parameter k (as identified in the discussion 
preceding equation (2.17) ) equal to 8, which is in the range of recommended values for 
that parameter given by Wang and Raferty (2002). 

In every case, MD and MD-R performance are nearly identical, and ED and RD 
performance are nearly identical. ED and RD perform as well or better than MD and 
MD-R; this performance difference is more evident in higher dimension and is most 
evident in for the multivariate Weibull case. These differences seem attributable to the 
fact that MD and MD-R must estimate the covariance of the underlying distribution, 
while for the cases we examine the underlying covariance is very close to the identity 
covariance assumed by ED and RD. 


^mvn;ESPM; 

mean change 

d = 

5; jump; r = 

101 

A 

MD 

ED 

MD-R 

RD 

0.00 

0.06 

0.06 

0.05 

0.06 

0.25 

0.16 

0.16 

0.16 

0.16 

0.50 

0.56 

0.58 

0.56 

0.57 

0.75 

0.93 

0.94 

0.93 

0.95 

1.00 

1.00 

1.00 

1.00 

1.00 


MD; Mahalanobis distance 
MD-R; Mahalanobis distance, robust 


d = 20; jump; r = 101 

MD 

ED 

MD-R 

RD 

0.05 

0.05 

0.05 

0.05 

0.11 

0.12 

0.11 

0.12 

0.31 

0.36 

0.30 

0.36 

0.72 

0.80 

0.71 

0.79 

0.96 

0.99 

0.96 

0.99 


ED; Euclidean distance 
RD; Multivariate rank distance 


Table 14. Ensemble Sum of Pair-Maxima (ESPM) test power to detect a mean 
change of magnitude A for MVN case with dimension d = 5 and change 
point r = 101 under different cost functions, based on = 200, a = 0.05, 
and 1000 simulations. Jump case is to the left; drift case is to the right. 
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^™x;ESPM; 
mean change 

d = 

5; jump; r = 

101 

A 

MD 

ED 

MD-R 

RD 

0.00 

0.05 

0.05 

0.05 

0.05 

0.25 

0.16 

0.15 

0.15 

0.16 

0.50 

0.46 

0.50 

0.49 

0.52 

0.75 

0.89 

0.91 

0.90 

0.90 

1.00 

0.99 

0.99 

0.99 

0.99 


MD; Mahalanobis distance 
MD-R; Mahalanobis distance, robust 


d = 20 ; jump; r = 101 

MD 

ED 

MD-R 

RD 

0.06 

0.05 

0.06 

0.06 

0.10 

0.09 

0.10 

0.09 

0.26 

0.34 

0.28 

0.33 

0.57 

0.69 

0.62 

0.69 

0.86 

0.95 

0.91 

0.95 


ED: Euelidean distance 
RD: Multivariate rank distanee 


Table 15. Ensemble Sum of Pair-Maxima (ESPM) test power to detect a mean 
change of magnitude A for MVN mixture case with dimension d = 5 and 
change point r = 101 under different cost functions, based on N = 200, 
a = 0.05 , and 1000 simulations. Jump case is to the left; drift case is to the 

right. 


F„,,;ESPM; 

/? change 

d = 

5; jump; r = 

101 

A 

MD 

ED 

MD-R 

RD 

0.00 

0.05 

0.06 

0.05 

0.05 

0.25 

0.24 

0.28 

0.25 

0.28 

0.50 

0.70 

0.77 

0.71 

0.76 

0.75 

0.93 

0.97 

0.95 

0.97 

1.00 

0.99 

1.00 

1.00 

1.00 


MD: Mahalanobis distance 
MD-R: Mahalanobis distance, robust 


d = 20 ; jump; r = 101 

MD 

ED 

MD-R 

RD 

0.05 

0.04 

0.05 

0.04 

0.14 

0.19 

0.15 

0.18 

0.46 

0.66 

0.47 

0.67 

0.73 

0.95 

0.77 

0.95 

0.93 

1.00 

0.95 

1.00 


ED: Euelidean distanee 
RD: Multivariate rank distance 


Table 16. Ensemble Sum of Pair-Maxima (ESPM) test power to detect a change 
in the scale parameter [3 of magnitude A for MV Weibull case with 
dimension d = 5 and change point r = 101 under different cost functions, 
based on N = 200, a = 0.05 , and 1000 simulations. Jump case is to the left; 

drift case is to the right. 
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D. COMPUTATIONAL DETAILS 


Simulations in Section B of this chapter were performed using R version 2.9.0 on 
the Hamming cluster of the Naval Postgraduate School’s High Performance Computing 
Center (HPC), which is a Sun Microsystems 6048 Blade modular system with 1152 
processing cores. We computed non-bipartite weighted matchings using Kolmogorov’s 
(2009) Blossom V algorithm. In its published form, Kolmogorov’s algorithm computes a 
single optimal non-bipartite matching on a set of N observations. We modified this 
routine slightly in the source language so that it computes a full sequence of orthogonal 
successive optimal matchings that rather than just a single matching. Table 17 shows 
typical realized runtimes to compute N /2 orthogonal successive optimal matchings 
calling Kolmogorov’s routine in R for various sample sizes. 


N 

Run time (sec) 

20 

<0.01 

50 

0.01 

100 

0.05 

200 

0.60 

300 

2.9 

400 

11 

500 

48 


Table 17. Typical time to compute N/2 orthogonal successive optimal 
matchings calling Kolmogorov’s algorithm (modified to compute orthogonal 
successive optimal matchings) in R, 


We realized significant reductions (at least two orders of magnitude) in total simulation 
time by taking advantage of the HPC’s batch job scheduling capability. 

Simulations in Section C were performed using R version 2.9.0 on a Windows XP 
machine with an Intel ® Pentium ® 4 3.4 GHz processor. We computed non-bipartite 
weighted matchings using Derigs’s (1988) algorithm. The algorithm itself is in the 
FORTRAN programming language; compiled code is embedded in a dynamic link 
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library file that can be called directly in R, provided by Professor Bo Lu at the Ohio State 
University. In its current form this algorithm requires that edge weights be integer¬ 
valued. For our purposes, we accommodated this requirement by rounding non-integer 
costs to four decimal places and then scaling each cost by 10"'. Additionally, this routine 
requires the assignment of a prohibitive cost to the diagonal of any cost matrix to block 
the pairing of an observation with itself For this study we set this prohibitive cost to be 
N ! 2 times the maximum of all interpoint costs. We used this same prohibitive cost for 
blocking to compute orthogonal successive optimal matchings. Table 18 shows typical 
realized runtimes to compute N! 2 orthogonal successive optimal matchings using 
Derigs’s algorithm in R for various sample sizes. 


N 

Run time (sec) 

20 

<0.01 

50 

0.02 

100 

0.2 

200 

2.9 

300 

16 

400 

56 

500 

143 


Table 18, Typical time to compute N12 orthogonal successive optimal 
matchings using Derigs’s algorithm in R, 


As mentioned previously, the theoretical runtime for existing algorithms to find a 
single optimal non-bipartite matching on a complete graph is 0{n^^. Our ESPM 

statistic involves computing N ! 2 successive optimal matchings on a graph, which can 
lead to lengthy run times for large sample sizes. Long runtimes for cases of very large 
sample size pose a practical limitation to the matching methods we propose. We discuss 
related research opportunities in the next chapter. 
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V. CONCLUSIONS AND OPPORTUNITIES FOR FURTHER 

RESEARCH 


In this dissertation, we introduee new nonparametrie matehing-based approaehes 
to the multidimensional ehange-point problem. These approaehes lead to effeetive 
ehange-point deteetion proeedures and highlight the potential value of matehing 
teehniques to more general statistieal applieations. Our review of the broad field of 
ehange-point deteetion reveals that this eontinues to be an area of aetive researeh and that 
robust multivariate approaehes to this problem remain few. Most existing approaehes 
make restrietive distributional assumptions (sueh as multivariate normality) or are limited 
to the single-test ease where the potential ehange point is pre-determined and the problem 
is the elassieal one of testing whether two samples of observations are from the same 
distribution. 

We propose four new ehange-point tests: the Sum of Pair-Maxima (SPM) test, the 
Non-Bipartite Aeeumulated Pairs (NAP) test, the Ensemble Sum of Pair-Maxima 
(ESPM) test, and the Bipartite Aeeumulated Pairs (BAP) test. The first three tests, 
designed to test for homogeneity among multivariate data when no observation history is 
available, all demonstrate power to deteet a ehange point under a variety of alternative 
hypotheses at fixed false alarm rates. The ESPM test utilizes additional ehange-point 
information available from many good (that is, low-weight) orthogonal matehings, and is 
superior among these nonparametrie tests to deteet a ehange point. Additionally, the 
ESPM test has power eomparable to a parametrie eompetitor, the JJS test, even when its 
parametrie assumptions are met. The power of the ESPM test not only establishes itself 
as an effeetive ehange-point test, but also validates matehing as a useful approaeh to the 
ehange-point problem. 

This researeh invites several possibilities for extension. One obvious question is 
whether or not any of these tests might be reasonably extended as sequential ehange- 
point tests. While it is diffieult in general to sequentialize hypothesis tests, sequential 
ehange-point deteetion teehniques would have valuable applieation. One requirement for 

sueh an extension would be to extend the theory presented here as neeessary for 
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sequentialization. Another practical problem associated with such an extension is the 
question of how to efficiently update an existing optimal matching on N observations 
with the addition of one or more data points to the observation set. 

Other opportunities include finding ensemble extensions for the NAP and BAP 
tests. The fact that the exact distributions of these individual tests are known might make 
the task of finding exact associated ensemble distributions (or good approximations) 
more tractable. Additionally, the performance of the BAP test remains to be evaluated. 

One challenge to research in this area is the scarcity of non-bipartite weighted 
matching software modules for typical statistical software applications. The simulation 
study for this research relies heavily on interfaces between C, C++, or FORTRAN; and 
S-PLUS®, R, or MATLAB® that we or others have built manually. The mainstreaming 
of any such interface would greatly enable broader related research. Research 
opportunities exist to improve the efficiency of non-bipartite matching algorithms. Even 
using existing algorithms, time improvements would be gained by reducing the number 
of orthogonal successive optimal matchings computed for the ESPM test. As presented 
here, the ESPM statistic is formed by summing over N /2 orthogonal successive optimal 
matchings where N is the sample size. Additional research is necessary to determine 
whether fewer (perhaps far fewer) orthogonal successive optimal matchings are adequate 
to achieve good detection power against alternate hypotheses. Also, it would be 
worthwhile to investigate the usefulness of “greedy” algorithms in this context. A greedy 
matching algorithm finds a good matching on N observations by pairing the two closest 
points, then the next two closest, and so on until a maximal matching has been 
constructed. This faster algorithm (O^A^j) does not in general provide an optimal non- 

bipartite matching. However, a greedy matching may still be good enough to provide 
valuable change-point information; we believe this would be a worthwhile area for study. 

Rosenbaum’s (2005) case for the consistency of the cross-match statistic seems 
quite reasonable, but as he states it is “admittedly informal.” His argument is also 
constrained to less general alternative hypotheses than we have considered in our work. 
Because our argument for the consistency of the SPM test (and therefore for the ESPM 
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test by direct extension) requires the consistency of the cross-match statistic, work needs 
to be done to establish the consistency of the cross-match statistic more formally and 
against alternative hypotheses of a more general nature. 

We alluded previously to the fact that machine health diagnosis and prognosis 
problems were an initial motivation for this research, and we are interested in ways to 
apply this work to that area. Such problems are often characterized by high 
dimensionality and serial correlation. In addition to detecting the presence of a change 
point in a sequence of observations, it would be useful also to estimate where in the 
sequence the change point occurred. Furthermore, it would be helpful in the event that a 
change point is detected to characterize the nature of the change (for example, abrupt or 
gradual) and the severity of the change for prognostic purposes such as estimating 
remaining useful life. 

An idea that seems worthy of consideration is a generalization of our matching 
approach to vertex groupings of cardinality greater than two. The tests we propose here 
are all matching-based, where we mean matching in the strict graph-theoretical sense as 
defined in Chapter II. Each matching is a collection of single edges, and each edge is in 
turn is a two-element subset of vertices. Algorithms to find optimal non-bipartite 
weighted matchings already exist, and we have demonstrated that matchings can be used 
for effective statistical inference. However, it might be worth examining whether 
collections of more than one edge (that is, collections of more that two vertices) might 
provide useful (or even better) information to the change-point problem. For example, 
instead of computing an optimal non-bipartite matching on a set of observations one 
might compute an optimal “three-grouping,” where the objective function for optimality 
might be to minimize the collective cycle cost or collective minimum spanning tree cost 
across subgroups of size three. Similar to the SPM test, one might consider the sum of 
group-maxima (or -minima, or -median, or some other unary set operator). Even more 
general “k-groupings” might be considered. Unlike the well-known method of k-means 
clustering, which partitions N observations into k groups (perhaps of different sizes) 
based on an objective criterion such as minimizing the sum if within-cluster differences, a 
“k-groupings” approach would specify group size k first and then collect vertices into 
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groups so as to minimize some particular criterion. We are not aware of any specialized 
algorithms to find such groupings and the associated statistical properties of such 
groupings are likely quite complicated, but these ideas might constitute fertile ground for 
research. 

Another interesting idea involves retaining some of the original observation 
information for the computation of a test statistic. The methods we propose use the 
observed data in two distinct steps: First we compute an optimal non-bipartite matching 
based on observation content excluding data sequence labels, then we compute a 
nonparametric test statistic based only on sequence labels with respect to that matching. 
However, it might be useful to carry over additional information from the data into the 
computation of a test statistic. For example, one might associate with each pair in the 
matching some measure of pair “quality” based on the cost of the pair. These quality 
values might then be used as weightings in the computation of a sum of pair-maxima- 
type statistic, and perhaps improve the detection power of such a test. 

Finally, an area upon which we have only touched briefly involves the choice of 
cost function. In research such as ours the existence of some appropriate dissimilarity 
measure associated with the sample space of interest is often assumed and from there the 
desired analysis proceeds. While our theoretical results regarding non-ensemble null 
distributions depend only on the exchangeability of sequence labels and not on choice of 
cost function, we expect detection power against alternative hypotheses to depend on that 
choice. While Mahalanobis distance (or some robust modification of Mahalanobis 
distance) is a natural choice of cost function for continuous random variables, cases of 
interest may include a mixture of categorical, ordinal, and continuous random variables. 
Even for continuous cases, the ability to detect change points with respect to a 
Mahalanobis distance function might be improved. For example, shifting observations 
by a component-wise smoothed mean can lead to better covariance matrix estimation in 
cases where a change point exists. In any case, further study of the effects of cost 
function choice on the power of tests presented here would be of useful, especially for 
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cases that include categorical or ordinal dimensions. In particular, for real-world 
application of these methods it would be worth investigating which cost functions lead to 
the most attractive power characteristics for the specific case at hand. 
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APPENDIX A: HIGHER MOMENT DERIVATIONS 


In this appendix, we derive various moments associated with the Sum of Pair- 
Maxima (SPM) statistic. 

A. MEAN AND VARIANCE OF 


For a non-bipartite match on N = 2n observations, each of the (2n )! possible 
assignments of ranks is equally likely under the null hypothesis, and the random variables 
are exchangeable. Therefore, T, takes on the value t when observation t is 

paired with some observation 5 of lesser sequence label and ( 5 ,?) is indexed as the first 


among the n pairs. But 
(E.l) 


P {t is paired with 5 } 


1 

2n-l 


and 

(E.2) P 1 ( 5 , t) is indexed first among matchesl t is paired with 5 j = — , 

k I J M 


SO 




(E.3) 


p|(5,t) is indexed first among matches \t is paired with , 
X P|t is paired with 5 } 


t-\ 


t -\ 1 1 

yl—^— =- 

^ n (2n-l) n{2n-\^ 


= ('-!) 


2n 

V 2 y 


,t = 2,...,2n 


The desired expressions follow directly: 


2n 




(t-l) _2(2n + l) 


^ n 


(=2 


(E.4) 


(2n-l) 3 

(t-l) _ (2n-I- 1 ) (3n-I- 1 ) 
n(2n — 1 ) 3 

Var\Y.\ = E\Yf\-E\Y^f = (^” + *)(”~*) 




t=2 
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B. COVARIANCE OF AND 


First we find P{Yj^=t\Y^ = by observing that for the ease t <s , 

Z('- 2 ) 


P(Y,^t\Y,^s) = 


u=\ 


'2n-2'^ 

-1 

^2n^ 

^-1 

f2n-2'^ 

-1 

^2n^ 

V 2 y 


v2y 

u=t+l 

1 2 y 


V 2 y 


(E.5) 


(.- 1 ) 


2n 

V 2 y 




(5 -l)(n -l)(2n -3) 




and for the ease t> s . 


P{Y,=t\Y,=s)^ 


Z(^- 3 ) 


'2n-2^ 

-1 

^2n^ 

V 2 y 


V 2 y 


(E.6) 


(.- 1 ) 

t-3 


(2n \ ' 


V 2 y 


t = s + \,...,2n. 


(n-l)(2n-3) ’ 

Now eondition on Y^ to eompute : 

r , 1 ^ (t-l)(5-3) ^ t-3 

E yjFi = 5 =yt-- -'V, ’ -r+ y ty-yy-y 

S (5-l)(n-l)(2n-3) ,=,+1 (n-l)(2n-3) 


(E.7) 


5(5-2)(5-3) ^ r4n(2n + l)(n-2) 5(5 + l)(5-4) 


3(n-l)(2n-3) 


so 


(E.8) 


4n(2n + l)(n -2)-25(5 -5) 

~ 3(n-l)(2n-3) 

4n(2n+ l)(n-2)-2j'(5'-5) 


2n 

3(n-l)(2n-3) 
8(2n + l)(5n + 2) 

45 ■ 


!(2n -l) 
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Therefore, 


Coy%Y^] = E[Y^Y^]-E[Y^]E[Y^ 


(E.9) 

8(2n + l)(5n + 2) 

' 2 ( 2 n + l)' 


45 

3 

/ 


4(2n + l) 



45 


C. 

THIRD MOMENTS OF 1^, E 2 , AND E 3 



As in the first and seeond moment ealculations, eompute direetly: 

E[y’] = t‘’ -P. + l)(24„"+15„ + l) 

f=2 


"^‘n(2n-l) 15 

Now compute conditioning on T, using (E.7): 


(E.ll) 


£fi"r]=£[i'r£[r,|i',]] 

4n(2n + l)(n-2)-25(5-5) 


^=2 


3(n-l)(2n-3) 
4(2n + l)(l5nVl0n + l) 


!(2n -l) 


45 

In a similar fashion, we apply a series of conditioning arguments to compute 
£[ 7172 ^ 3 ]- Eirst, let andZ^^^ take on the values of E^Ej, and Ej such that 

< Z^ 2 ) < -2(3) (these Z^^.j are unrelated to the Z; of the BAP test). Then 


E[YJJ,] = E 


7 7 7 

(1) (2) (3). 


= E 


7 E 


7 E 


71 ) 


7 7 

^(2)’^(3) 


T3) 


A direct 


combinatorial argument gives 

(E.12) p(z,., = ,„Z„, = <,,Zp, = <,) = 2 ^ ^ 2 «, 



'2n-2^ 

'2n-A^ 

v2. 

V 2 , 

V 2 , 
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so 


2n-2 


(E.13) 


ti=2 

_3{t,-\){t,-2){t,-3){t,-5) 


( 2n3f 2n — 2'\( 2n — A\ 




, 4 < ?2 < L < 2n, 


and 


(E.14) 


2«-l 

—4 


P{Z,,, =?J= > E(Z^2) =h’^{3) = h 


) 


3(i,-l)(L-2)(i,-3)(i,-4)fe-5) 

r2nV2n-2V2n-4^ 


v^yv ^ yv ^ y 


6 < E < 2n. 


Therefore, 


(E.15) 


^ (^( 1 ) “ ^1 


Z(2) - ?2 5 Z^2j - ^3 1 - 


E ( Z^jj , Z^2) ^2 ’ -^(3) ^3 ^ 

^-^(2) “ ^2’-^(3) “ ^3) 


2(?,-l) 

(?2-1)(?2-2)^ 


2 < < ?2 < ?3 ^ 2n, 


and 


(E.16) 


^ I ^(2) “ ^2 


^(3) - E 1 “ 


-^(^(2) ^2’^(3) 


'(^m * 1 ) 


4fe-l)(<.-2)fe-3) 

('3-l)('j-2)('3-3)('3-‘t)' 
Now compute conditional expected values 


4 < ^2 < ^3 ^ 2n. 


41 ) 


2(2) -^2.Z(3) -^3 


(,-l 


Zjj, tj 


r,=2 


2(2) “ ^2 ’ 2(3( - ^3 


(E.17) 


_ 2tj (t, 1 ) 

■;^(t2-l)(t2-2) 


2l 


no 



and 


(E.18) 


Finally, 


(E.19) 


1-1 

7 

L ( 1 ) 

1-1 

N 

fN 

N 

2(3) ^3J 

= E 

27^ 

^^(2) 

3 

2(3) ^3 


3 (,,-l)(,3-2)(!,-3)(t-4) 
45 


E[YJJ,] = E 


7 F 


7 F 


^(1) 


7 7 

^(2)’^(3) 


-(3) 


= E 


44 (^^( 3 ) 

45 


4 ^ 3 ^ ( 5 ^ 3 - 1 ) 3 (^ 3 - 1 )(^ 3 - 2 )(^ 3 - 3 )(^ 3 - 4 )(^ 3 - 5 ) 


(,=6 


45 


2nV 2n-2V 2n-4 




16(2n + l)(70nV49n + 6) 
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APPENDIX B: QUANTILE TABLES 


A. APPROXIMATE CRITICAL VALUES FOR 


a 

N ^6 

8 

10 

12 

14 

16 

18 

20 

22 

0.001 

- 

- 

- 

43 

58 

76 

97 

120 

145 

0.005 

- 

- 

31 

44 

60 

79 

99 

123 

149 

0.01 

- 

- 

31 

45 

61 

80 

101 

125 

151 

0.025 

- 

21 

32 

46 

63 

81 

103 

127 

154 

0.05 

- 

21 

33 

47 

64 

83 

105 

129 

156 

0.1 

13 

22 

34 

48 

65 

85 

107 

132 

159 

a 

24 

26 

28 

30 

32 

34 

36 

38 

40 


173 

204 

237 

272 

310 

351 

394 

440 

489 


178 


242 

279 

317 

359 


449 

498 

0.01 

180 

211 

245 

282 

321 

363 

407 

454 

503 

0.025 

183 

215 

249 

286 

326 

368 

413 

460 

510 

0.05 

186 

218 

253 

290 

330 

373 

418 

466 

516 

0.1 

189 

222 

257 

295 

335 

378 

424 

472 

523 

a 

60 

80 

100 

120 

140 

160 

180 

200 

220 

0.001 

1113 

1995 

3137 

4537 

6199 

8121 

10304 

12749 

15455 

0.005 

1131 

2023 

3175 

4588 

6262 

8199 

10397 

12857 

15581 

0.01 

1140 

2036 

3194 

4612 

6293 

8236 

10442 

12910 

15641 

0.025 

1152 

2056 

3221 

4648 

6338 

8292 

10508 

12987 

15731 

0.05 

1163 

2073 

3244 

4679 

6377 

8339 

10564 

13054 

15807 

0.1 

1176 

2092 

3272 

4715 

6422 

8394 

10630 

13130 

15896 

a 

240 

260 

280 

300 

320 

340 

360 

380 

400 

0.001 

18424 

21655 

25148 

28903 

32922 

37203 

41747 

46554 

51624 

0.005 

18567 

21816 

25328 

29103 

33142 

37444 

42009 

46838 

51931 

0.01 

18636 

21894 

25415 

29200 

33248 

37560 

42136 

46976 

52080 

0.025 

18737 

22008 

25543 

29342 

33404 

37732 

42323 

47179 

52299 

0.05 

18825 

22107 

25653 

29464 

33539 

37879 

42483 

47353 

52487 

0.1 

18925 

22220 

25780 

29604 

33694 

38049 

42668 

47553 

52703 


Table 19. Estimated critical values for . 


Critical regions correspond to values of strictly less than the appropriate 
quantile; ^ is sample size and is significance level. Values for N = 6, 8, and 10 are 
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exact; values for N >\0 are approximations by Edgeworth expansion using (3.31). A 
dash entry means that the significance level of interest cannot be attained for that 
sample size. 


B. APPROXIMATE CRITICAL VALUES FOR 


N = 20 

0.001 

0.005 

0.01 

0.025 

0.05 

0.1 


jV^40 

0.001 

0.005 

0.01 

0.025 

0.05 

0.1 


jV^60 

0.001 

0.005 

0.01 

0.025 


d = 1 

2 

3 

4 

5 

10 

20 

50 

2.57 

2.40 

2.27 

2.17 

2.10 

1.93 

1.88 

1.78 

1 . 04 ] 

1 . 03 ] 

[. 04 ] 

[. 02 ] 

1 . 02 ] 

1 . 02 ] 

[• 02 ] 

[• 02 ] 

1.96 

1.88 

1.81 

1.76 

1.72 

1.60 

1.56 

1.52 

1 . 02 ] 

1 . 02 ] 

[. 02 ] 

[. 01 ] 

1 . 02 ] 

1 . 02 ] 

[• 01 ] 

KOI] 

1.72 

1.66 

1.60 

1.56 

1.53 

1.46 

1.43 

1.38 

1 . 02 ] 

1 . 01 ] 

[. 01 ] 

[. 01 ] 

1 . 02 ] 

1 . 01 ] 

KOI] 

[• 01 ] 

1.38 

1.35 

1.33 

1.31 

1.29 

1.24 

1.22 

1.21 

KOI] 

1 . 01 ] 

[. 02 ] 

[. 01 ] 

1 . 02 ] 

[<• 01 ] 

[• 02 ] 

KOI] 

1.13 

1.12 

1.10 

1.10 

1.09 

1.07 

1.07 

1.03 

KOI] 

1 . 02 ] 

[. 01 ] 

KOI] 

1 . 01 ] 

KOI] 

[• 01 ] 

[• 02 ] 

0.90 

0.90 

0.89 

0.89 

0.89 

0.88 

0.88 

0.86 

KOI] 

KOI] 

KOI] 

[• 01 ] 

1 . 01 ] 

[• 01 ] 

[• 02 ] 

[• 01 ] 

d=i 

2 

3 

4 

5 

10 

20 

50 

2.77 

2.57 

2.42 

2.31 

2.22 

2.02 

1.95 

1.84 

[• 04 ] 

[. 04 ] 

[. 03 ] 

[• 04 ] 

[• 03 ] 

[• 02 ] 

[• 02 ] 

[• 02 ] 

2.11 

1.99 

1.90 

1.83 

1.78 

1.66 

1.62 

1.56 

1 . 02 ] 

[. 02 ] 

[. 01 ] 

1 . 02 ] 

1 . 01 ] 

[• 01 ] 

[• 01 ] 

[• 01 ] 

1.83 

1.74 

1.68 

1.63 

1.59 

1.50 

1.47 

1.43 

1 . 01 ] 

[. 01 ] 

[. 01 ] 

1 . 01 ] 

1 . 01 ] 

[• 01 ] 

[• 01 ] 

[• 01 ] 

1.47 

1.42 

1.38 

1.35 

1.33 

1.28 

1.26 

1.24 

1 . 01 ] 

[. 01 ] 

[. 01 ] 

1 . 01 ] 

1 . 01 ] 

[• 01 ] 

[• 01 ] 

KOI] 

1.20 

1.17 

1.15 

1.14 

1.13 

1.10 

1.09 

1.08 

1 . 01 ] 

[. 01 ] 

KOI] 

KOI] 

KOI] 

KOI] 

[<• 01 ] 

KOI] 

0.93 

0.92 

0.92 

0.91 

0.91 

0.91 

0.91 

0.90 

KOI] 

KOI] 

KOI] 

KOI] 

KOI] 

KOI] 

KOI] 

KOI] 

d=l 

2 

3 

4 

5 

10 

20 

50 

2.80 

2.61 

2.46 

2.35 

2.26 

2.05 

1.97 

1.86 

1 . 04 ] 

[. 05 ] 

[. 04 ] 

1 . 03 ] 

1 . 03 ] 

[• 02 ] 

[• 02 ] 

[• 02 ] 

2.15 

2.02 

1.93 

1.86 

1.80 

1.68 

1.64 

1.57 

1 . 02 ] 

[. 02 ] 

[. 02 ] 

1 . 01 ] 

1 . 01 ] 

[• 01 ] 

[• 01 ] 

[• 01 ] 

1.85 

1.76 

1.70 

1.65 

1.62 

1.53 

1.50 

1.44 

1 . 01 ] 

[. 01 ] 

[. 01 ] 

1 . 01 ] 

1 . 01 ] 

[• 01 ] 

[• 01 ] 

[• 01 ] 

1.47 

1.43 

1.39 

1.37 

1.35 

1.30 

1.28 

1.25 

1 . 01 ] 

[. 01 ] 

[. 01 ] 

1 . 01 ] 

1 . 01 ] 

[• 01 ] 

KOI] 

KOI] 
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0.05 


1.20 

1.18 

1.16 

1.15 

1.14 

1.11 

1.10 

1.09 

1.01] 

1.01] 

KOI] 

KOI] 

KOI] 

KOI] 

t<.01] 

KOI] 

0.94 

0.93 

0.93 

0.93 

0.92 

0.92 

0.92 

0.92 

KOI] 

KOI] 

KOI] 

KOI] 

KOI] 

KOI] 

KOI] 

KOI] 


>80 

0.001 

0.005 

0.01 

0.025 

0.05 

0.1 


N is sample size, d is dimension, and a is signifieanee level. Critieal values 
are listed with assoeiated standard error in square braekets. Critieal regions eorrespond to 
values of strietly greater than the appropriate quantile. Critieal values are eomputed 

by 100,000 simulations for eaeh ease of sample size and dimension using uniformly 
distributed data and matehing with respeet to Euelidean distanee. Standard error for 
quantiles is determined by the Maritz-Jarrett method (Maritz and Jarrett, 1978). 
Simulation suggests that these eritieal value approximations are independent of 
underlying distribution and eost function. 

Interpolate to find critical values for , d , ov a not provided in the table. Use 
d = 50 to approximate critical values for J > 50. For sample size or dimension far 
outside the bounds of these tables, critical values ought to be approximated by simulation 
for the case at hand. 


d= 1 

2 

3 

4 

5 

10 

20 

50 

2.88 

2.67 

2.50 

2.38 

2.29 

2.06 

1.98 

1.88 

1.06] 

1.03] 

1.03] 

1.03] 

1.03] 

1.02] 

1.02] 

1.02] 

2.15 

2.04 

1.96 

1.89 

1.84 

1.70 

1.65 

1.59 

1.02] 

1.02] 

1.01] 

1.01] 

1.01] 

1.01] 

1.01] 

1.01] 

1.86 

1.78 

1.72 

1.67 

1.63 

1.54 

1.50 

1.45 

1.01] 

1.01] 

1.01] 

1.01] 

1.01] 

1.01] 

1.01] 

1.01] 

1.49 

1.45 

1.41 

1.38 

1.36 

1.31 

1.29 

1.26 

1.01] 

1.01] 

1.01] 

1.01] 

1.01] 

1.01] 

1.01] 

KOI] 

1.21 

1.19 

1.18 

1.16 

1.15 

1.13 

1.11 

1.10 

1.01] 

1.01] 

1.01] 

t<.01] 

t<.01] 

t<.01] 

t<.01] 

KOI] 

0.95 

0.94 

0.94 

0.94 

0.93 

0.93 

0.93 

0.92 

t<.01] 

KOI] 

KOI] 

KOI] 

t<.01] 

t<.01] 

t<.01] 

KOI] 


Table 20, Approximate critical values for . 
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APPENDIX C: R FUNCTIONS FOR CRITICAL ENVELOPES 


In this appendix, we provide R (2005) language code to compute critical 
envelopes for the Non-Bipartite Accumulated Pairs (NAP) and Bipartite Accumulated 
Pairs (BAP) tests. 

A. CRITICAL ENVELOPE FOR THE NAP TEST 

function(alpha, n) { 

# alpha = non-simultaneous alpha value (rejection for 

# exceeding a critical threshold) 

# n = number of pairs (N = 2*n is total sample size) 

# Returned is a list of critical boundary values, and the 

# probability of violating at least one of them. Boundary 

# values themselves are not critical (e.g. reject the null 

# if any value is strictly greater than the boundary value). 
nl <- n - 1 

N <- 2 * n 
Nl <- N - 1 
qvec <- numeric(Nl) 
for (k in 2:Nl) { 

rmin <- max(c(0, k - n) ) 
rvec <- rmin:floor(k/2) 

cvec <- cumsum(choose(n, k - rvec) * choose(k - rvec, rvec) 

* 2^(k-2 * rvec))/choose(N, k) 
qvec[k] <- which(cvec > (1-alpha - le-010)) [1] + rmin-1 

} 

qvec[1] <- 1 

kstar <- max(which(qvec < 1:N1)) 

a <- rep(0, n) 

a [1:2] <- 1 

qv <- qvec[2] 

for (k in 3:kstar) { 

aO <- a * ((0:nl) <= qv) 
qv <- qvec[k] 
qvl <- qv + 1 

if (qv > 0) a[2:qvl] <- a0[2:qvl] - (2 * (diff(aO[1:qvl]) * 

(1: qv)) ) / k 

} 

rvec <- max(c(0, kstar - n)):qv 

cvec <- (choose (n, kstar - rvec) * choose (kstar - rvec, rvec) * 

2^(kstar - 2 * rvec))/choose(N, kstar) 
alpha.sim <- 1 - sum(a[rvec + 1] * cvec) 

return(list(kseq <- 2:N1, envelope = qvec[-l], alpha.sim = 
alpha.sim)) 

} 
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B, 


CRITICAL ENVELOPE FOR THE BAP TEST 


function(alpha, m, n) { 

# alpha = non-simultaneous alpha value (rejection for 

# exceeding a critical threshold) 

# m = number of control points 

# n = number of test points (must have n > m) 

# Returned is a list of critical boundary values, and the 

# probability of violating at least one of them. Boundary 

# values themselves are not critical (e.g. reject the null 

# if any value is strictly greater than the boundary value) 
if(n <= m) { 

cat("*** Invalid arguments ***", "\n") 

return () 

} 

qvec <- qhyper(l - alpha, m, n - m, 1:(n - 1)) 
sq <- which(diff(c(0, qvec)) < le-010) 

pvec <- (((m - qvec[sq])/(n - sq + 1)) * dhyper(qvec[sq], m, n 

m, sq))/phyper(qvec[sq], m, n - m, sq) 
return(list(envelope = qvec, alpha.sim = 1 - prod(l - pvec))) 

} 
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APPENDIX D: EXAMPLE DATA FOR FIGURES 1-7 


id. 

x2 

0.8057 

0.2209 

-1.3556 

-1.0061 

0.1209 

-0.4531 

-0.2222 

1.3995 

0.5717 

-0.4620 

-0.3001 

0.0327 

1.1343 

0.7988 

-0.1794 

0.8968 

-1.4671 

0.1379 

1.3953 

-1.6191 

0.4408 

-1.6466 

0.5654 

0.4287 

-0.6936 

-0.7372 

0.8339 

0.5649 

-2.2374 

-1.3842 

1.0976 

0.4603 

-0.0016 

0.6294 

-1.6146 

0.3798 

-1.2287 

-1.0133 

0.2074 

-0.3472 


Table 21. Example data for Figures 1-7 
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